Design Patterns for Resiliency - The Practice of Cloud System Administration

Information Technology Reference

In-Depth Information

6.6.3 Load Balancers

Whetheraserverfailsbecauseofadeadmachine,anetworkissue,orabug,aresilientway

to deal with this failure is by use of replicas and some kind of load balancer.

Thesameloadbalancerdescribedpreviouslytogainscaleisalsousedtogainresiliency.

However, when using this approach to gain scale, each replica added was intended to add

capacity that would be used. Now we are adding spare capacity that is an insurance policy

we hope not to use.

When using a load balancer it is important to consider whether it is being used for scal-

ing, resiliency, or both. We have observed situations where it was assumed that the pres-

ence of a load balancer means the system scales and is resilient automatically. This is not

true. The load balancer is not magic. It is a technology that can be used for many different

things.

Scale versus Resiliency

If we are load balancing over two machines, each at 40 percent utilization, then either ma-

chine can die and the remaining machines will be 80 percent utilized. In such a case, the

load balancer is used for resiliency.

If we are load balancing over two machines, each at 80 percent utilization, then there is

no spare capacity available if one goes down. If one machine died, the remaining replica

would receive all the traffic, which is 160 percent of what the machine can handle. The

machine will be overloaded and may cease to function. Two machines each at 80 percent

utilization represents an N + 0 configuration. In this situation, the load balancer is used for

scale, not resiliency.

In both of the previous examples, the same configuration was used: two machines and

a load balancer. Yet in one case resiliency was achieved and in the other case scale was

achieved.Thedifferencebetweenthetwowastheutilization,ortraffic,beingprocessed.In

other words, 50 percent is 100 full when you have only two servers.

If we take the second example and add a third replica but the amount of traffic does not

change, then 160 percent of the total 300 percent capacity is in use. This is an N + 1 con-

figuration since one replica can die and the remaining replicas can still handle the load. In

this case, the load balancer is used for both scale and resiliency.

A load balancer provides scale when we use it to keep up with capacity, and resiliency

whenweuseittoexceedcapacity.Ifutilizationincreasesandwehavenotaddedadditional

replicas, we run the risk of no longer being able to claim resiliency. If traffic is high during

the day and low at night, we can end up with a system that is resilient during some hours

of the day and not others.

Search WWH ::

Custom Search

Home