Application Architectures - The Practice of Cloud System Administration

Information Technology Reference

In-Depth Information

• Utilization Limit: Each server estimates how many more QPS it can handle and

communicates this to the load balancer. The estimates may be based on current

throughput or data gathered from synthetic load tests.

• Latency: The load balancer stops forwarding requests to a backend based on the

latency of recent requests. For example, when requests are taking more than 100

ms, the load balancer assumes this backend is overloaded. This technique manages

bursts of slow requests or pathologically overloaded situations.

• Cascade: The first replica receives all requests until it is at capacity. Any overflow

is directed to the next replica, and so on. In this case the load balancer must know

precisely how much traffic each replica can handle, usually by static configuration

based on synthetic load tests.

4.2.3 Load Balancing with Shared State

Anotherissuewithloadbalancingamongmanyreplicasissharedstate.SupposeoneHTTP

request generates some information that is needed by the next HTTP request. A single web

server can store that information locally so that it is available when the second HTTP re-

quest arrives. But what if the load balancer sends the next HTTP request to a different

backend? It doesn't have that information (state). This can cause confusion.

Consider the commonly encountered case in which one HTTP request takes a user's

name and password and validates it, letting the user log in. The server stores the fact that

the user is logged in and reads his or her profile from the database. This is stored locally

for fast access. Future HTTP requests to the same machine know that the user is logged in

and have the user profile on hand, so there's no need to access the database.

What if the load balancer sends the next HTTP request to a different backend? This

backend will not know that the user is logged in and will ask the user to log in again. This

is annoying to the user and creates extra work for the database.

There are a few strategies for dealing with this situation:

• Sticky Connections: Load balancers have a feature called stickiness , which

means if a user's previous HTTP request went to a particular backend, the next one

should go there as well. That solves the problem discussed earlier, at least initially.

However, if that backend dies, the load balancer will send requests from that user

to another backend; it has no choice. This new backend will not know the user is

logged in. The user will be asked to log in again. Thus, this is only a partial solu-

tion.

• Shared State: In this case the fact that the user has logged in and the user's profile

information are stored somewhere that all backends can access. For each HTTP

Search WWH ::

Custom Search

Home