Design Patterns for Scaling - The Practice of Cloud System Administration

Information Technology Reference

In-Depth Information

• By Arbitrary Group: If a cluster of machines can reliably scale to 50,000 users,

then start a new cluster for each 50,000 users. Email services often use this

strategy.

5.3.4 Combinations

ManyscalingtechniquescombinemultipleaxesoftheAKFScalingCube.Someexamples

include the following:

• Segment plus Replicas: Segments that are being accessed more frequently can be

replicated at a greater depth. This enables scaling to larger datasets (more seg-

ments) and better performance (more replicas of a segment).

• Dynamic Replicas: Replicas are added and removed dynamically to achieve re-

quired performance. If latency is too high, add replicas. If utilization is too low, re-

move replicas.

• Architectural Change: Replicas are moved to faster or slower technology based

on need. Infrequently accessed shards are moved to slower, less expensive techno-

logy such as disk. Shards in higher demand are moved to faster technology such as

solid-state drives (SSD). Extremely old segments might be archived to tape or op-

tical disk.

5.4 Caching

Acacheisasmalldatastoreusingfast/expensivemedia,intendedtoimproveaslow/cheap

biggerdatastore.Forexample,recentdatabasequeriesmaybestoredinRAMsothatifthe

same query is repeated, the disk access can be avoided. Caching is a distinct pattern all its

own, considered an optimization of the z -axis of the AKF Scaling Cube.

Consider lookups in a very large data table. If the table was stored in RAM, lookups

could be very fast. Assume the data table is larger than will fit in RAM, so it is stored on

disk. Lookups on the disk are slow. To improve performance, we allocate a certain amount

ofRAManduseitasacache.Nowwhenwedoalookup,firstwecheckwhethertheresult

can be found in the cache. If it is, the result is used. This is called a cache hit . If it is not

found, the normal lookup is done from the disk. This is called a cache miss . The result is

returned as normal and in addition is stored in the cache so that future duplicate requests

will be faster.

Figure1.10 listsperformancecomparisonsusefulforestimatingthespeedofacachehit

and miss. Forexample, if yourdatabase is in Netherlands and youare in California, a disk-

based cache is faster if it requires fewer than 10 seeks and two or three 1MB disk reads.

In contrast, if your database queries are within the same datacenter, your cache needs to be

significantly faster, such as RAM or a cache server on the same subnet.

Search WWH ::

Custom Search

Home