Database Reference
In-Depth Information
Featture--based s
l segmenttattiion
This is the approach taken by Randy Shoup, Distinguished Architect at eBay, who in 2006
helped bring their architecture into maturity to support many billions of queries per day.
Using this strategy, the data is split not by dividing records in a single table (as in the custom-
er example discussed earlier), but rather by splitting into separate databases the features that
don't overlap with each other very much. For example, at eBay, the users are in one shard,
and the items for sale are in another. At Flixster, movie ratings are in one shard and com-
ments are in another. This approach depends on understanding your domain so that you can
segment data cleanly.
d shard o
d or f
r functtiional s
Key--based d sharding
In this approach, you find a key in your data that will evenly distribute it across shards. So
instead of simply storing one letter of the alphabet for each server as in the (naive and im-
proper) earlier example, you use a one-way hash on a key data element and distribute data
across machines according to the hash. It is common in this strategy to find time-based or
numeric keys to hash on.
Lookup p table
In this approach, one of the nodes in the cluster acts as a “yellow pages” directory and looks
up which node has the data you're trying to access. This has two obvious disadvantages. The
first is that you'll take a performance hit every time you have to go through the lookup table
as an additional hop. The second is that the lookup table not only becomes a bottleneck, but
a single point of failure.
NOTE
To read about how they used data sharding strategies to improve performance at Flixster, see ht-
tp://lsvp.wordpress.com/2008/06/20 .
Sharding can minimize contention depending on your strategy and allows you not just to scale
horizontally, but then to scale more precisely, as you can add power to the particular shards that
need it.
Sharding could be termed a kind of “shared-nothing” architecture that's specific to databases. A
shared-nothingarchitecture is one in which there is no centralized (shared) state, but each node
in a distributed system is independent, so there is no client contention for shared resources. The
term was first coined by Michael Stonebraker at University of California at Berkeley in his 1986
paper “The Case for Shared Nothing.”
Search WWH ::




Custom Search