Quantcast
Channel: VMware Communities : Discussion List - All Communities
Viewing all articles
Browse latest Browse all 193198

ESXi 5.1/5.5 High CPU usage and CPU ready time

$
0
0

Hey guys,

 

I have a very weird issue with VM performance and I haven't been able to narrow it down at all. this is going to be a long post so I hope you will bare with me to fully understand the topology and the issue.

 

Topology:

2 ESXI HOSTS + 2 SWITCHES + STORAGE WITH 2 SP'S

each with 2 Physical CPU's with 4/6 cores

one host has 64GB of RAM and the other has 82GB of RAM

each host is running between 10-13 VM's

each host has at-least 4 dedicated NIC's going into the storage (two per each SP via two dedicated switches, a switch per controller)

Firewall connected to both hosts and on that firewall there are vLAN's

 

General workflow:

Each department / "company" has their own vLAN in the firewall with applicable rules, the same vlan is configured on both the switchs and in the ESXi hosts.

This is how we control and limit the traffic between each internal department, we consider each of these as a Company or a Client.

 

Most VM's are used for TERMINAL SERVICES. out of all the VM's about 80% are RDS machines, that run some sort of a business application: ERP/CRM/Accounting etc, each machine has ESET AV and Office installed


The Issue:

When the users connect to these terminal servers and start working the software (ERP/CRM/Accounting etc..) gets stuck / freezes sometimes for 30 seconds and then suddenly goes back to working. now this issue occurs on ALL the machines, but not at the same time.

I mean some machines will get stuck every X minutes / hours and some will get stuck every few days.


When looking at this issue we noticed that every time the machines get stuck the CPU usage is 100% we also noticed that the CPU Ready time is very high.

How high? well its on avg of 3500-10500 per the whole host with most VM's are over 100 with a few VM's over 600-700, this happens on both HOSTS with different VM's


Steps taken to try and resolve this issue:

We first started by removed the excess vCPU's from the VM's. before each VM had 4 vCPU's (2 vSockets with 2 cores per socket), we have lowered it to 1 vSocket with 1 core and some machines that performed really badly we had increased to 2 vSockets and 1 core. we also tried 1 vSocket and 2 cores. but that did not help, the issues continued.

 

We have tried to uninstall the backup agents from the VM's, apply windows updates. remove windows updates. all did not help.

 

After all that did not help, we brought in a third host. This new host has only 2 NIC's so we had plugged one into the storage and one into the FW. We have moved two machines over and the problem still happens, less often but happens.

 

We moved all VM's back, disconnected the host from the FW and connected it directly with one cable to the main office switch, and put two VM's on local disks. this worked perfectly everything was running nice and smooth.

We then moved these two VM's into the storage back but kept the host off the FW and the machines kept working great.

 

We thought this was a firewall issue. we got a FW new unit and connected it to the main switch  and put those two VM's behind it. we also added three (3) more VM's as people really started to complain.

We now have two (2) VM's running on local SAS disks and three (3) VM's running on the storage which also has SAS disks.

 

Hold and behold the issues are back. high CPU and high ready time. the current avg on the 3rd host is: 1500-2000 but the two original hosts are still at and avg of 2500-9000

 

IF ANYONE HAS ANY IDEA ON HOW TO PROCEED OR SOLVE THIS ISSUE PLEASE REPLAY AND HELP

 

THANK YOU

 

-Dave


Viewing all articles
Browse latest Browse all 193198

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>