Databases Reference
In-Depth Information
A good way to start is to diagram your data model with an entity-relationship diagram,
or an equivalent tool that shows all the entities and their relationships. Try to lay out
the diagram so that the related entities are close together. You can often inspect such
a diagram visually and find candidates for partitioning keys that you'd otherwise miss.
Don't just look at the diagram, though; consider your application's queries as well.
Even if two entities are related in some way, if you seldom or never join on the rela-
tionship, you can break the relationship to implement the sharding.
Some data models are easier to shard than others, depending on the degree of connec-
tivity in the entity-relationship graph. Figure 11-8 depicts an easily sharded data model
on the left, and one that's difficult to shard on the right.
Figure 11-8. Two data models, one easy to shard and the other difficult
The data model on the left is easy to shard because it has many connected subgraphs
consisting mostly of nodes with just one connection, and you can “cut” the connections
between the subgraphs relatively easily. The model on the right is hard to shard, because
there are no such subgraphs. Most data models, luckily, look more like the lefthand
diagram than the righthand one.
When choosing a partitioning key, try to pick something that lets you avoid cross-shard
queries as much as possible, but also makes shards small enough that you won't have
problems with disproportionately large chunks of data. You want the shards to end up
uniformly small, if possible, and if not, at least small enough that they're easy to balance
by grouping different numbers of shards together. For example, if your application is
US-only and you want to divide your dataset into 20 shards, you probably shouldn't
shard by state, because California has such a huge population. But you could shard by
county or telephone area code, because even though these won't be uniformly popu-
lated, there are enough of them that you can still choose 20 sets that will be roughly
 
Search WWH ::




Custom Search