vSphere bug can bring down entire cluster (not fixed in Update1).
Last month I was on-site at one of my customers and experienced a major problem on the vSphere environment. We suddenly experienced about 150 virtual servers running on 16 hosts in 2 clusters (on the same SAN) going in jabber-mode. They froze, which made them unavailable for pings and other traffic. The freeze moment varied from 1 to 30 seconds. After that the VM’s went back on-line again. This seemed to happen in groups of about 10 to 30 VM’s.
Pretty soon we saw in the logs that a VMFS LUN was removed by one of the SAN administrators. This LUN was still attached to all the ESX hosts in that cluster but was not “in use”, meaning, there were no VM’s running on it.
Off course the unclean removal on SAN-level of a LUN before detaching it from the ESX hosts is not the way to do this but that what I just described shouldn’t have happend!
The quick solution was … Continue Reading



