Information Technology Reference
In-Depth Information
The load balancer must always know which backends are alive and ready to accept re-
quests. Load balancers send health check queries dozens of times each second and stop
sending traffic to that backend if the health check fails. A health check is a simple query
that should execute quickly and return whether the system should receive traffic.
Picking which backend to send a query to can be simple or complex. A simple method
would be to alternate among the backends in a loop—a practice called round-robin . Some
backends may be more powerful than others, however, and may be selected more often us-
ing a proportional round-robin scheme. More complex solutions include the least loaded
scheme. In this approach, a load balancer tracks how loaded each backend is and always
selects the least loaded one.
Selecting the least loaded backend sounds reasonable but a naive implementation can
be a disaster. A backend may not show signs of being overloaded until long after it has
actually become overloaded. This problem arises because it can be difficult to accurately
measurehowloadedasystemis.Iftheloadisameasurementofthenumberofconnections
recently sent to the server, this definition is blind to the fact that some connections may be
long lasting while others may be quick. If the measurement is based on CPU utilization,
this definition is blind to input/output (I/O) overload. Often a trailing average of the last 5
minutes of load is used. Trailing averages have a problem in that, as an average, they re-
flect the past, not the present. As a consequence, a sharp, sudden increase in load will not
be reflected in the average for a while.
Imagine a load balancer with 10 backends. Each one is running at 80 percent load. A
new backend is added. Because it is new, it has no load and, therefore, is the least loaded
backend. A naive least loaded algorithm would send all traffic to this new backend; no
traffic would be sent to the other 10 backends. All too quickly, the new backend would
become absolutely swamped. There is no way a single backend could process the traffic
previously handled by 10 backends. The use of trailing averages would mean the older
backends would continue reporting artificially high loads for a few minutes while the new
backend would be reporting an artificially low load.
Withthisscheme,theloadbalancerwillbelievethatthenewmachineislessloadedthan
all the other machines for quite some time. In such a situation the machine may become so
overloaded that it would crash and reboot, or a system administrator trying to rectify the
situation might reboot it. When it returns to service, the cycle would start over again.
Such situations make the round-robin approach look pretty good. A less naive least
loaded implementation would have some kind of control in place that would never send
more than a certain number of requests to the same machine in a row. This is called a slow
start algorithm.
Search WWH ::




Custom Search