Exhausting inodes + Disconnected Host

Hi All,

I just had an interesting issue, and though I would share this as it might save you co-ordinating some planned downtime which could potentially be avoided.

We had a disconnected host where the guest VM's are all still running and can be accessed via RDP, but the host is not responsive through the iLO / DCUI and SSH is not running and can't be started.

The host logged the following sequential events in vCenter;

The root filesystem's file table is full. As a result, the file tmp:/auto-backup.1481830/etc/hosts could not be created by the application 'tar'.
The root filesystem's file table is full. As a result, the file tmp:/auto-backup.1482016/etc/sfcb/repository/root/interop/cim_listenerdestinationcimxml.idx could not be created by the application 'tar'.
The root filesystem's file table is full. As a result, the file tmp:/auto-backup.1482194/etc/vmware/hostd/vmAutoStart.xml could not be created by the application 'tar'.
The root filesystem's file table is full. As a result, the file /etc/vmware/esx.conf.LOCK.17554 could not be created by the application 'hostd-worker'.
The root filesystem's file table is full. As a result, the file /var/log/ipmi/0/.sensor_threshold.raw could not be created by the application 'sfcb-vmware_raw'.
The root filesystem's file table is full. As a result, the file /var/log/ipmi/0/.sensor_hysteresis.raw could not be created by the application 'sfcb-vmware_raw'.
The root filesystem's file table is full. As a result, the file /var/run/sfcb/52c25dd2-064a-abee-ce4c-cafd051d527c could not be created by the application 'sfcb-CIMXML-Pro'.
The root filesystem's file table is full. As a result, the file /var/log/ipmi/0/.sel_header.raw could not be created by the application 'sfcb-vmware_raw'.
The root filesystem's file table is full. As a result, the file /var/run/sfcb/52ca5a12-1d8d-7902-1e14-170d2c282951 could not be created by the application 'sfcb-CIMXML-Pro'.
The root filesystem's file table is full. As a result, the file /var/log/ipmi/0/.sensor_readings.raw could not be created by the application 'sfcb-vmware_raw'.
The root filesystem's file table is full. As a result, the file /etc/vmware/esx.conf.LOCK.17554 could not be created by the application 'hostd-worker'.
Unable to apply DRS resource settings on host. A general system error occurred: Invalid fault. This can significantly reduce the effectiveness of DRS.
The root filesystem's file table is full. As a result, the file /var/run/sfcb/523777d0-72dc-9e0b-c6b0-9d32a5255317 could not be created by the application 'sfcb-CIMXML-Pro'.
The root filesystem's file table is full. As a result, the file /var/run/sfcb/52fc39a4-62d0-866e-50a3-663209c9ca28 could not be created by the application 'sfcb-CIMXML-Pro'.
The vSphere HA availability state of this host has changed to Unreachable
Host is not responding
Alarm 'Host connection state' on myhost.mydomain changed from Green to Red
Alarm 'Host connection state' on myhost.mydomain sent email tomyemail@mydomain
vSphere HA agent for this host has an error: The vSphere HA agent is not reachable from vCenter Server
Alarm 'vSphere HA host status' on myhost.mydomain changed from Green to Red
vSphere HA agent for this host has an error: The vSphere HA agent is not reachable from vCenter Server
Cannot scan the host myhost.mydomain because its power state is unknown.
Host is not responding

I found this KB article, but was unable to start the process as I couldn't SSH onto the host;

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2037798

Since I knew which guests were running on the affected host, I contacted the business and arranged emergency downtime to shut these guests down so that I could power cycle the host and deal with the issue. After lots of co-ordination we finally agreed on a suitable time which satisfied all business areas, and started the remediation.

Now here is the interesting part ... within seconds of shutting down guest VM's with a simple for loop and the shutdown command the host staus changed to Green and was connected to vCenter again.

for /f %i in (C:\_temp\targets.txt) do shutdown -s -m\\%i-t 0 -f

I enabled SSH and ran "stat -f /" - results below;

~ # stat -f /
File: "/"
    ID: 1        Namelen: 127     Type: visorfs
Block size: 4096
Blocks: Total: 449852     Free: 324368     Available: 324368
Inodes: Total: 8192       Free: 55

After running throught the above mentioned KB article, the inodes were still exhausted;

/var/run/sfcb # stat -f /
File: "/"
    ID: 1        Namelen: 127     Type: visorfs
Block size: 4096
Blocks: Total: 449852     Free: 324565     Available: 324565
Inodes: Total: 8192       Free: 122

So now that the host was available again I put it into Maintenance mode, rebooted it and checked again after the reboot (plenty of free inodes);

~ # stat -f /
File: "/"
    ID: 1        Namelen: 127     Type: visorfs
Block size: 4096
Blocks: Total: 449852     Free: 332942     Available: 332942
Inodes: Total: 8192       Free: 5721

All VM's that were shutdown were now powered up using PowerCLI.

So the interesting point that could potentially be taken from this is that next time this issue occurs, I might be able to resolve the issue by shutting down one or more running VM's without affecting all guest VM's ... so perhaps shutdown the lowest priority non-production VM's first to see if this frees up enough inodes to get the host responsive again.

So two questions;

Is this logic flawed?
Is there a method to monitor FREE inodes so that this can be caught in advance of it becoming and issue involving downtime?

Cheers, & happy new year!

Jon

Exhausting inodes + Disconnected Host

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112