Ensuring High Availability and Business Continuity - Mastering VMware vSphere 5.5 - page 396

Information Technology Reference

In-Depth Information

vSphere HA primarily targets ESXi host failures, but it can also be used to protect against

VM- and application-level failures. In all cases, vSphere HA uses a restart of the VM as the

mechanism for addressing the detected failure. This means there is a period of downtime when

a failure occurs. Unfortunately, you can't calculate the exact duration of the downtime because it

is unknown ahead of time how long it will take to boot a VM or a series of VMs. From this you

can gather that vSphere HA might not provide the same level of high availability found in other

high-availability solutions. Further, when a failover occurs between ESXi hosts as a result of the

vSphere HA feature, there is a slight potential for data loss and/or i lesystem corruption because

the VM was immediately powered off when the server failed and then brought back up minutes

later on another server. However, given the journaling i lesystems in use by Windows and many

distributions of Linux, this possibility is relatively slim.

vSphere HA Experience in the Field

Author Nick Marshall says, “I want to mention my own personal experience with vSphere HA and

the results I encountered. Your mileage might vary, but this should give you a reasonable expecta-

tion of what to expect. I had a VMware ESXi host that was a member of a fi ve-node cluster. h is

node crashed some time during the night, and when the host went down, it took anywhere from

15 to 20 VMs with it. vSphere HA kicked in and restarted all the VMs as expected.

“What made this an interesting experience is that the crash must have happened right after the

polling of the monitoring and alerting server. All the VMs that were on the general alerting schedule

were restarted without triggering any alerts. Some of the VMs with more aggressive monitoring

that tripped off alerts that were recovered before anyone was able to log into the system and inves-

tigate. I tried to argue the point that if an alert never fi red, did the downtime really happen? I did

not get too far with that argument, but I was pleased with the results.

“In another case, during testing I had a VM running on a two-node cluster. I pulled the power cords

on the host that the VM was running to create the failure. My time to recovery from pull to ping

was between 5 and 6 minutes. h at's not too bad for general use but not good enough for all cases.

vSphere Fault Tolerance can now fi ll that gap for even the most important and critical servers in

your environment. We'll talk more about vSphere FT in a bit.”

Understanding vSphere HA's Underpinnings

On the surface, the functionality of vSphere HA is similar to the functionality provided in pre-

vious versions of vSphere. Under the covers, though, from vSphere 5.0 HA uses a new VMware-

developed tool known as Fault Domain Manager (FDM). FDM was developed from the ground

up to replace Automated Availability Manager (AAM), which powered vSphere HA in earlier

versions of vSphere. AAM had a number of notable limitations, including a strong dependence

on name resolution and scalability limits. FDM was developed to address these limitations

Next Page

Mastering VMware vSphere 5.5

Search WWH ::

Custom Search

Home