I've got a very odd installation, it's been stable for 6 months, I'm not sure if this is conciendence after implementing VDP, but now after 45 seconds, the ESXi 5.1.0 Build 799733 hosts disconnect from vCenter Server 5.1.0 Build 947673 and appear as not responding.
I can right click each host, and issue a Connect and the Servers are re-connected, and then after 45 seconds, appear as not responding again.
The vCenter is a virtual machine, on the cluster of two Hosts, using VMXNET3 interface, fully patched.
from the vCenter server at the time of the disconnects or not responding, I'm not seeing any ping timeouts to the hosts, from vCenter VM to Hosts, I can also connect directly to the ESXi servers, with vSphere Client, and no issues there, I remain connected.
DNS is okay, Default Gateway is okay, database being used is default install with vCenter (simple install), SQL Express, database is not full or reached it's limit.
nothing in Windows Event Log...
things tried
1. Restarted ESXi hosts
2. Restart Network Management agents
3. Upgraded vCenter Server to Build Build 947673
4. restarted vCenter Server
5. Shutdown all VMs, except vCenter Server including VDP.
not done hosts yet!
in the vpxd logs I can see the following:- so I can see I think the disconnection, but no idea as to why!
2013-01-17T13:08:37.741Z [06164 info 'vpxdvpxdVmomi' opID=SWI-6e140ec8] [ClientAdapterBase::InvokeOnSoap] Invoke done (esx003.domain.ac.uk, vpxapi.VpxaService.fetchQuickStats)
2013-01-17T13:08:37.741Z [06164 info 'vpxdvpxdVmomi' opID=SWI-6e140ec8] [ClientAdapterBase::InvokeOnSoap] Invoke done (esx004.domain.ac.uk, vpxapi.VpxaService.fetchQuickStats)
2013-01-17T13:08:39.659Z [06292 info 'vpxdvpxdHostCnx' opID=SWI-7a1cb616] [VpxdHostCnx] No heartbeats received from host 5203136b-c37d-c238-69c3-7101151dae9b within 4398798518000 ms
2013-01-17T13:08:39.659Z [06292 info 'vpxdvpxdHostCnx' opID=SWI-7a1cb616] [VpxdHostCnx] No heartbeats received from host 5274675a-2fb6-a9ae-7787-9f24c69ff6e5 within 4398798518000 ms
2013-01-17T13:08:39.659Z [06620 info 'vpxdvpxdInvtHostCnx'] [VpxdInvtHost] Got lost connection callback for host-28
2013-01-17T13:08:39.659Z [04264 info 'commonvpxLro'] [VpxLRO] -- BEGIN task-internal-212 -- host-28 -- VpxdInvtHostSyncHostLRO.Synchronize --
2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-28
2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHost::FixNotRespondingHost] Returning false since host is already fixed!
2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Failed to fix not responding host host-28
2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-28
2013-01-17T13:08:39.659Z [04264 error 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] FixNotRespondingHost failed for host host-28, marking host as notResponding
2013-01-17T13:08:39.659Z [06620 info 'vpxdvpxdInvtHostCnx'] [VpxdInvtHost] Got lost connection callback for host-24
2013-01-17T13:08:39.659Z [06620 info 'commonvpxLro'] [VpxLRO] -- BEGIN task-internal-213 -- host-24 -- VpxdInvtHostSyncHostLRO.Synchronize --
2013-01-17T13:08:39.659Z [06620 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-24
2013-01-17T13:08:39.659Z [06620 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHost::FixNotRespondingHost] Returning false since host is already fixed!
2013-01-17T13:08:39.659Z [06620 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Failed to fix not responding host host-24
2013-01-17T13:08:39.659Z [06620 warning 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-24
2013-01-17T13:08:39.659Z [06620 error 'vpxdvpxdInvtHostCnx'] [VpxdInvtHostSyncHostLRO] FixNotRespondingHost failed for host host-24, marking host as notResponding
2013-01-17T13:08:39.659Z [04264 warning 'vpxdvpxdMoHost'] [HostMo] host connection state changed to [NO_RESPONSE] for host-28
2013-01-17T13:08:39.784Z [04264 info 'vpxdvpxdMoHost'] [HostMo::SetComputeCompatibilityDirty] Marked host-28 as dirty.
2013-01-17T13:08:39.784Z [04264 info 'clustervpxdMoCluster'] [ClusterMo::SetDasCompatDirty] Marked domain-c37 as dirty.
2013-01-17T13:08:39.800Z [04264 info 'commonvpxLro'] [VpxLRO] -- FINISH task-internal-212 -- host-28 -- VpxdInvtHostSyncHostLRO.Synchronize --
2013-01-17T13:08:39.831Z [06620 warning 'vpxdvpxdMoHost'] [HostMo] host connection state changed to [NO_RESPONSE] for host-24
2013-01-17T13:08:39.987Z [06620 info 'vpxdvpxdMoHost'] [HostMo::SetComputeCompatibilityDirty] Marked host-24 as dirty.
2013-01-17T13:08:39.987Z [06620 info 'clustervpxdMoCluster'] [ClusterMo::SetDasCompatDirty] Marked domain-c37 as dirty.
2013-01-17T13:08:40.003Z [06620 info 'commonvpxLro'] [VpxLRO] -- FINISH task-internal-213 -- host-24 -- VpxdInvtHostSyncHostLRO.Synchronize --
any ideas?