Information Technology Reference
In-Depth Information
vSphere HA primarily targets ESXi host failures, but it can also be used to protect against
VM- and application-level failures. In all cases, vSphere HA uses a restart of the VM as the
mechanism for addressing the detected failure. This means there is a period of downtime when
a failure occurs. Unfortunately, you can't calculate the exact duration of the downtime because it
is unknown ahead of time how long it will take to boot a VM or a series of VMs. From this you
can gather that vSphere HA might not provide the same level of high availability found in other
high-availability solutions. Further, when a failover occurs between ESXi hosts as a result of the
vSphere HA feature, there is a slight potential for data loss and/or i lesystem corruption because
the VM was immediately powered off when the server failed and then brought back up minutes
later on another server. However, given the journaling i lesystems in use by Windows and many
distributions of Linux, this possibility is relatively slim.
vSphere HA Experience in the Field
Author Nick Marshall says, “I want to mention my own personal experience with vSphere HA and
the results I encountered. Your mileage might vary, but this should give you a reasonable expecta-
tion of what to expect. I had a VMware ESXi host that was a member of a fi ve-node cluster. h is
node crashed some time during the night, and when the host went down, it took anywhere from
15 to 20 VMs with it. vSphere HA kicked in and restarted all the VMs as expected.
“What made this an interesting experience is that the crash must have happened right after the
polling of the monitoring and alerting server. All the VMs that were on the general alerting schedule
were restarted without triggering any alerts. Some of the VMs with more aggressive monitoring
that tripped off alerts that were recovered before anyone was able to log into the system and inves-
tigate. I tried to argue the point that if an alert never fi red, did the downtime really happen? I did
not get too far with that argument, but I was pleased with the results.
“In another case, during testing I had a VM running on a two-node cluster. I pulled the power cords
on the host that the VM was running to create the failure. My time to recovery from pull to ping
was between 5 and 6 minutes. h at's not too bad for general use but not good enough for all cases.
vSphere Fault Tolerance can now fi ll that gap for even the most important and critical servers in
your environment. We'll talk more about vSphere FT in a bit.”
Understanding vSphere HA's Underpinnings
On the surface, the functionality of vSphere HA is similar to the functionality provided in pre-
vious versions of vSphere. Under the covers, though, from vSphere 5.0 HA uses a new VMware-
developed tool known as Fault Domain Manager (FDM). FDM was developed from the ground
up to replace Automated Availability Manager (AAM), which powered vSphere HA in earlier
versions of vSphere. AAM had a number of notable limitations, including a strong dependence
on name resolution and scalability limits. FDM was developed to address these limitations
 
Search WWH ::




Custom Search