Databases Reference
In-Depth Information
• Aggregating results across shards
• Cross-shard joins
• Locking and transaction management
• Creating shards (or at least discovering new shards on the fly) and rebalancing
shards if you have time to implement this
You might not have to build your own sharding infrastructure from scratch. There are
several tools and systems that either provide some of the necessary functionality or are
specifically designed to implement a sharded architecture.
One database abstraction layer with sharding support is Hibernate Shards ( http://shards
.hibernate.org ) , Google's extension to the open source Java-based Hibernate object-
relational mapping (ORM) library. It provides shard-aware implementations of the
Hibernate Core interfaces, so applications don't necessarily have to be redesigned to
use a sharded data store; in fact, they might not even have to be aware that they're using
one. Hibernate Shards uses a fixed allocation strategy to allocate data to the shards.
Another Java sharding system is HiveDB ( http://www.hivedb.org ) .
In PHP, you can use Justin Swanhart's Shard-Query system ( http://code.google.com/p/
shard-query/ ) , which automatically decomposes queries, executes them in parallel, and
combines the results. Commercial systems targeted at similar use cases are ScaleBase
( http://www.scalebase.com ), ScalArc ( http://www.scalarc.com ) , and dbShards ( http://
www.dbshards.com ) .
Sphinx is a full-text search engine, not a sharded data storage and retrieval system, but
it is still useful for some types of queries across sharded data stores. It can query remote
systems in parallel and aggregate the results. Sphinx is discussed further in Appendix F .
Scaling by Consolidation
A heavily sharded architecture creates an opportunity to get more out of your hardware.
Our research and experience have shown that MySQL can't use the full power of
modern hardware. As you scale beyond about 24 CPU cores, MySQL's efficiency starts
to level off. A similar thing happens beyond 128 MB of memory, and MySQL can't even
come close to using the full I/O power of high-end PCIe flash devices such as Virident
and Fusion-io cards.
Instead of using a single server instance on a powerful machine, there's another option.
You can make your shards small enough that you can fit several per machine (a practice
we've been advocating anyway), and run several instances per server, carving up the
server's physical resources to give each instance a portion.
This actually works, although we wish it weren't necessary. It's a combination of the
scale-up and scale-out approaches. You can do it in other ways—you don't have to use
sharding—but sharding is a natural fit for consolidation on large servers.
 
Search WWH ::




Custom Search