Information Technology Reference
In-Depth Information
By Arbitrary Group: If a cluster of machines can reliably scale to 50,000 users,
then start a new cluster for each 50,000 users. Email services often use this
strategy.
5.3.4 Combinations
ManyscalingtechniquescombinemultipleaxesoftheAKFScalingCube.Someexamples
include the following:
Segment plus Replicas: Segments that are being accessed more frequently can be
replicated at a greater depth. This enables scaling to larger datasets (more seg-
ments) and better performance (more replicas of a segment).
Dynamic Replicas: Replicas are added and removed dynamically to achieve re-
quired performance. If latency is too high, add replicas. If utilization is too low, re-
move replicas.
Architectural Change: Replicas are moved to faster or slower technology based
on need. Infrequently accessed shards are moved to slower, less expensive techno-
logy such as disk. Shards in higher demand are moved to faster technology such as
solid-state drives (SSD). Extremely old segments might be archived to tape or op-
tical disk.
5.4 Caching
Acacheisasmalldatastoreusingfast/expensivemedia,intendedtoimproveaslow/cheap
biggerdatastore.Forexample,recentdatabasequeriesmaybestoredinRAMsothatifthe
same query is repeated, the disk access can be avoided. Caching is a distinct pattern all its
own, considered an optimization of the z -axis of the AKF Scaling Cube.
Consider lookups in a very large data table. If the table was stored in RAM, lookups
could be very fast. Assume the data table is larger than will fit in RAM, so it is stored on
disk. Lookups on the disk are slow. To improve performance, we allocate a certain amount
ofRAManduseitasacache.Nowwhenwedoalookup,firstwecheckwhethertheresult
can be found in the cache. If it is, the result is used. This is called a cache hit . If it is not
found, the normal lookup is done from the disk. This is called a cache miss . The result is
returned as normal and in addition is stored in the cache so that future duplicate requests
will be faster.
Figure1.10 listsperformancecomparisonsusefulforestimatingthespeedofacachehit
and miss. Forexample, if yourdatabase is in Netherlands and youare in California, a disk-
based cache is faster if it requires fewer than 10 seeks and two or three 1MB disk reads.
In contrast, if your database queries are within the same datacenter, your cache needs to be
significantly faster, such as RAM or a cache server on the same subnet.
Search WWH ::




Custom Search