Design Patterns for Scaling - The Practice of Cloud System Administration

Information Technology Reference

In-Depth Information

um number of shards, the CPU or network bandwidth will be exhausted and performance

suffer. New hardware models should be benchmarked to determine the usable capacity be-

fore being put into service. Ideally CPU, shard storage, and network bandwidth will all top

out at the same time.

If shards are used for a read/write database, each write updates the appropriate shards.

Replicas must be kept up-to-date, abiding by the CAP Principle discussed in Chapter 1 .

In reality, shards are often used to distribute a read-only corpus of information. For ex-

ample, a search engine collects data and indexes it, producing shards of a read-only data-

base. These then need to be distributed to each search cluster replica. Distribution can be

rather complex. Transmitting an updated shard can take a long time. If it is copied over the

old data, the server cannot respond to requests on that shard until the update is complete.

If the transfer fails for some reason, the shard will be unusable. Alternatively, one could

set aside storage space for temporary use by shards as they are transmitted. Of course, this

means there is unused space when updates are not being done. To eliminate this waste,

one can stop advertising that the shard is on this machine so that the replicas will process

any requests instead. Now it can be upgraded. The process of unadvertising until no new

requests are received is called draining . One must be aware that while a shard is being

drained, there is one fewer replica—so performance may suffer. It is important to globally

coordinate shard upgrades so that enough replicas exist at any given moment to maintain

reliability and meet performance goals.

5.6 Threading

Data can be processed in different ways to achieve better scale. Simply processing one re-

quest at a time has its limits. Threading is a technique that can be used to improve system

throughput by processing many requests at the same time.

Threading is a technique used by modern operating systems to allow sequences of in-

structions to execute independently. Threads are subsets of processes; it's typically faster

to switch operations among threads than among processes. We use threading to get a fine

granularity of control over processing for use in complex algorithms.

In a single-thread process, we receive a query, process it, send the result, and get the

next query. This is simple and direct. A disadvantage is that a single long request will stall

the requests behind it. It is like wanting to buy a pack of gum but being in line behind a

person with a full shopping cart. In this so-called head of line blocking , the head of the

line is blocked by a big request. The result is high latency for requests that otherwise could

be serviced quickly.

A second disadvantage to single-threading is that in a flood of requests, some requests

will be dropped. The kernel will queue up incoming connections while waiting for the pro-

Search WWH ::

Custom Search

Home