Introducing Cassandra - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

Featture--based s

l segmenttattiion

This is the approach taken by Randy Shoup, Distinguished Architect at eBay, who in 2006

helped bring their architecture into maturity to support many billions of queries per day.

Using this strategy, the data is split not by dividing records in a single table (as in the custom-

er example discussed earlier), but rather by splitting into separate databases the features that

don't overlap with each other very much. For example, at eBay, the users are in one shard,

and the items for sale are in another. At Flixster, movie ratings are in one shard and com-

ments are in another. This approach depends on understanding your domain so that you can

segment data cleanly.

d shard o

d or f

r functtiional s

Key--based d sharding

In this approach, you find a key in your data that will evenly distribute it across shards. So

instead of simply storing one letter of the alphabet for each server as in the (naive and im-

proper) earlier example, you use a one-way hash on a key data element and distribute data

across machines according to the hash. It is common in this strategy to find time-based or

numeric keys to hash on.

Lookup p table

In this approach, one of the nodes in the cluster acts as a “yellow pages” directory and looks

up which node has the data you're trying to access. This has two obvious disadvantages. The

first is that you'll take a performance hit every time you have to go through the lookup table

as an additional hop. The second is that the lookup table not only becomes a bottleneck, but

a single point of failure.

NOTE

To read about how they used data sharding strategies to improve performance at Flixster, see ht-

Sharding can minimize contention depending on your strategy and allows you not just to scale

horizontally, but then to scale more precisely, as you can add power to the particular shards that

need it.

Sharding could be termed a kind of “shared-nothing” architecture that's specific to databases. A

shared-nothingarchitecture is one in which there is no centralized (shared) state, but each node

in a distributed system is independent, so there is no client contention for shared resources. The

term was first coined by Michael Stonebraker at University of California at Berkeley in his 1986

paper “The Case for Shared Nothing.”

Search WWH ::

Custom Search

Home