My client is seeing storage I/O latency on there systems at low utilization times. messages similar to those references in http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2007236.
The are running 4 IBM HS23 ESXi 5.0 U1 hosts with 2 CPUs 16 cores and 128 GB of RAM, running on Qlogic Fiber Channel SAN adapters. Storage is access via a pair of Qlogic switches 2 uplinks per switch to the IBM DS 3500 storage. DS3500 is configured for RAID 5 with an extra hot spare drive.
We originally placed a call to IBM and had the system evaluated by IBM for any issues. at this point the system is performing "in the top 10% of storage systems" per IBM support. Also IBM support has indicated that these latency messages should be ignored if not accompanied by other storage related issues. My client wants a better answer, they have opened a ticket with VMware support only to be told to go and talk with IBM there must be something wrong with your system.
IBM storage and Qlogic FC Switches are not reporting errors or I/O issues of any kind.
At this point the client is annoyed but reluctantly willing to accept IBMs opinion on this issue. I would like some closure and to be able to point future clients in the right direction.
Below is a screen capture off the I/O latency messages we are receiving.
My opinion: considering this alert is simply reporting that there is a delta of 20% or more between latency tests, I am will to just ignore the message. You can easily see a large swing in I/O latency measurements at low I/O times, all you need is for noting to be going on a measurement 1 and a write to happen in conjunction with measurement 2.
Does anyone have another opinion or a better explanation of KB 2007236.