Information Technology Reference
In-Depth Information
The Power of 2
We use a power of 2 to optimize the hash-to-shard mapping process. When you
want the remainder of the hash when divided by 2 n , you just need to look at the
last n bits of the hash. This is a very fast operation, much faster than getting the
modulus using a number that is not a power of 2.
Shards can be replicated on multiple machines to improve performance. With such an
approach, each replica processes a share of the queries destined for that shard. Replication
can also provide better availability.Ifmultiple machines store anyshard,then anymachine
can crash or be taken down for maintenance and the other replicas will continue to service
the requests.
As more data are stored, the shards may outgrow the machine. The database can then be
split over twice as many shards. The number of shards doubles, and the old data is divided
among the new shards. Each key's hash is evaluated to determine if the key should stay in
thecurrentshardorifitbelongstothenewshard.Systemsthatperformthisstepwhilelive
have complex algorithms to manage queries received during the expansion.
It is rather inflexible to require that the number of segments be a power of 2. For ex-
ample, going from one shard to two requires adding just one machine, which is easy to
purchase.However,asthesystemgrows,youmayfindyourselfneedingtogofrom32ma-
chines to 64 machines, which is quite a large purchase. The next jump is twice as big. If
thenewmachinesaremorepowerful,thisextracapacitywillbewasteduntilallthesmaller
machinesareeliminated.Also,whilethenumberofkeysisevenlydividedbetweenshards,
each key may not store exactly the same amount of data. Thus a machine may have one
fourthofthekeys,butmoredatathancanfitonthemachine.Onemustincreasethenumber
of shards based on the largest shard, which could be considerably bigger than the smallest.
The solution to these problems is to create more, smaller shards and store multiple
shards on each machine. Now you can vary the number of shards on a machine to com-
pensate for uneven shard sizes and different machine sizes. For example, you could divide
thehashvaluebyalargerpowerof2andproduce,forexample,8192buckets.Dividethose
buckets across as many machines as needed. For example, if there are three machines, one
twice as large as the other two, the larger machine might store keys that fall into buck-
et 0...4095 (4096 total) in the larger server and store buckets 4096...6143 and 6144...8191
(2048 keys each) in the second and third machines, respectively.
Asnew,morepowerfulhardwarebecomesavailable,onecanpackmoreshardsonama-
chine. More shards mean more queries will be directed to that machine. Thus, more CPU
and network bandwidth is required. It is possible that when the machine stores the maxim-
Search WWH ::




Custom Search