Scaling MySQL - High Performance MySQL

Databases Reference

In-Depth Information

queries the helper application, which is often a web service or a worker service such as

Gearman.

Cross-shard queries can also benefit from summary tables. You can build them by

traversing all the shards and storing the results redundantly on each shard when they're

complete. If duplicating the data on each shard would be too wasteful, you can con-

solidate the summary tables onto another data store, so they're stored only once.

Nonsharded data often lives in the global node, with heavy caching to shield it from

the load.

Some applications use essentially random sharding when perfectly even data distribu-

tion is important, or when there is no good partitioning key. A distributed search ap-

plication is a good example. In this case, cross-shard queries and aggregation are the

norm, not the exception.

Querying across shards isn't the only thing that's harder with sharding. Maintaining

data consistency is also difficult. Foreign keys won't work across shards, so the normal

solution is to check referential integrity as needed in the application, or use foreign keys

within a shard, because internal consistency within a shard might be the most important

thing. It's possible to use XA transactions, but this is uncommon in practice because

of the overhead.

You can also design cleanup processes that run intermittently. For example, if a user's

book club account expires, you don't have to remove it immediately. You can write a

periodic job to remove the user's comments from the per-book shard, and you can build

a checker script that runs periodically and makes sure the data is consistent across the

shards.

Allocating data, shards, and nodes

Shards and nodes don't have to have a one-to-one relationship. It's often a good idea

to make a shard's size much smaller than a node's capacity, so you can store multiple

shards on a single node.

Keeping each shard small helps keep the data manageable. It makes it easier to do

database backups and recovery, and if the tables are small, it can ease jobs such as

schema changes. For example, suppose you have a 100 GB table that you can either

store as it is or split into 100 shards of 1 GB tables, which you would store on a single

node. Now suppose you want to add an index to the table(s). This would take much

longer on a 100 GB shard than it would on all the 1 GB shards combined, because the

1 GB shards fit completely in memory. You also might need to make the data unavail-

able while ALTER TABLE is running, and blocking 1 GB of data is much better than

blocking 100 GB.

Search WWH ::

Custom Search

Home