Scaling MySQL - High Performance MySQL

Databases Reference

In-Depth Information

• Aggregating results across shards

• Cross-shard joins

• Locking and transaction management

• Creating shards (or at least discovering new shards on the fly) and rebalancing

shards if you have time to implement this

You might not have to build your own sharding infrastructure from scratch. There are

several tools and systems that either provide some of the necessary functionality or are

specifically designed to implement a sharded architecture.

One database abstraction layer with sharding support is Hibernate Shards ( http://shards

.hibernate.org ) , Google's extension to the open source Java-based Hibernate object-

relational mapping (ORM) library. It provides shard-aware implementations of the

Hibernate Core interfaces, so applications don't necessarily have to be redesigned to

use a sharded data store; in fact, they might not even have to be aware that they're using

one. Hibernate Shards uses a fixed allocation strategy to allocate data to the shards.

Another Java sharding system is HiveDB ( http://www.hivedb.org ) .

In PHP, you can use Justin Swanhart's Shard-Query system ( http://code.google.com/p/

shard-query/ ) , which automatically decomposes queries, executes them in parallel, and

combines the results. Commercial systems targeted at similar use cases are ScaleBase

( http://www.scalebase.com ), ScalArc ( http://www.scalarc.com ) , and dbShards ( http://

www.dbshards.com ) .

Sphinx is a full-text search engine, not a sharded data storage and retrieval system, but

it is still useful for some types of queries across sharded data stores. It can query remote

systems in parallel and aggregate the results. Sphinx is discussed further in Appendix F .

Scaling by Consolidation

A heavily sharded architecture creates an opportunity to get more out of your hardware.

Our research and experience have shown that MySQL can't use the full power of

modern hardware. As you scale beyond about 24 CPU cores, MySQL's efficiency starts

to level off. A similar thing happens beyond 128 MB of memory, and MySQL can't even

come close to using the full I/O power of high-end PCIe flash devices such as Virident

and Fusion-io cards.

Instead of using a single server instance on a powerful machine, there's another option.

You can make your shards small enough that you can fit several per machine (a practice

we've been advocating anyway), and run several instances per server, carving up the

server's physical resources to give each instance a portion.

This actually works, although we wish it weren't necessary. It's a combination of the

scale-up and scale-out approaches. You can do it in other ways—you don't have to use

sharding—but sharding is a natural fit for consolidation on large servers.

Search WWH ::

Custom Search

Home