Designing in a Distributed World - The Practice of Cloud System Administration

Information Technology Reference

In-Depth Information

The load balancer must always know which backends are alive and ready to accept re-

quests. Load balancers send health check queries dozens of times each second and stop

sending traffic to that backend if the health check fails. A health check is a simple query

that should execute quickly and return whether the system should receive traffic.

Picking which backend to send a query to can be simple or complex. A simple method

would be to alternate among the backends in a loop—a practice called round-robin . Some

backends may be more powerful than others, however, and may be selected more often us-

ing a proportional round-robin scheme. More complex solutions include the least loaded

scheme. In this approach, a load balancer tracks how loaded each backend is and always

selects the least loaded one.

Selecting the least loaded backend sounds reasonable but a naive implementation can

be a disaster. A backend may not show signs of being overloaded until long after it has

actually become overloaded. This problem arises because it can be difficult to accurately

measurehowloadedasystemis.Iftheloadisameasurementofthenumberofconnections

recently sent to the server, this definition is blind to the fact that some connections may be

long lasting while others may be quick. If the measurement is based on CPU utilization,

this definition is blind to input/output (I/O) overload. Often a trailing average of the last 5

minutes of load is used. Trailing averages have a problem in that, as an average, they re-

flect the past, not the present. As a consequence, a sharp, sudden increase in load will not

be reflected in the average for a while.

Imagine a load balancer with 10 backends. Each one is running at 80 percent load. A

new backend is added. Because it is new, it has no load and, therefore, is the least loaded

backend. A naive least loaded algorithm would send all traffic to this new backend; no

traffic would be sent to the other 10 backends. All too quickly, the new backend would

become absolutely swamped. There is no way a single backend could process the traffic

previously handled by 10 backends. The use of trailing averages would mean the older

backends would continue reporting artificially high loads for a few minutes while the new

backend would be reporting an artificially low load.

Withthisscheme,theloadbalancerwillbelievethatthenewmachineislessloadedthan

all the other machines for quite some time. In such a situation the machine may become so

overloaded that it would crash and reboot, or a system administrator trying to rectify the

situation might reboot it. When it returns to service, the cycle would start over again.

Such situations make the round-robin approach look pretty good. A less naive least

loaded implementation would have some kind of control in place that would never send

more than a certain number of requests to the same machine in a row. This is called a slow

start algorithm.

Search WWH ::

Custom Search

Home