Databases Reference
In-Depth Information
• Use a single database per shard, and include the shard number in both the database
and table names (e.g., the table name would become bookclub_23.comments_23 ).
• Run multiple MySQL instances per node, each with one or more shards, arranged
in any sensible combination of the ways we've just mentioned.
If you include the shard number in the table name, you'll need some way to insert the
shard number into templated queries. Typical practices include special “magic” place-
holder values in queries, sprintf() -style formatting specifications such as %s , and string
interpolation with variables. Here is one way you can create templated queries in PHP:
$sql = "SELECT book_id, book_title FROM bookclub_%d.comments_%d... ";
$res = mysql_query(sprintf($sql, $shardno, $shardno), $conn);
You could also just use string interpolation:
$sql = "SELECT book_id, book_title FROM bookclub_$shardno.comments_$shardno ...";
$res = mysql_query($sql, $conn);
This is easy to build into a new application, but it might be harder for existing appli-
cations. When we're building new applications and query templating isn't an issue, we
like to use a single database per shard, with the shard number in both the database and
the table name. It adds some complexity for jobs such as scripting ALTER TABLE , but it
has advantages, too:
• You can move a shard easily if it's completely contained in a single database.
• Because a database is a directory in the filesystem, you can manage a shard's files
easily.
• It's easy to find out how large the shard is if it isn't mixed up with other shards.
• The globally unique table names help avoid mistakes. If table names are the same
everywhere, it's easy to accidentally query the wrong shard because you connected
to the wrong node, or import one shard's data into another shard's tables.
You might want to consider whether your application's data has any shard affinity . You
might benefit from placing certain shards “near” each other (on the same server, on the
same subnet, in the same data center, or on the same switch) to exploit some similarity
in the data access patterns. For example, you can shard by user and then place users
from the same country into shards on the same nodes.
Adding sharding support to an existing application often results in one shard per
node. This simplification helps limit how much you need to change the application's
queries. Sharding is usually a pretty disruptive change for an application, so it makes
sense to simplify where possible. If you shard so each node looks like a miniature copy
of the whole application's data, you might not have to change most of the queries or
worry about routing queries to the desired node.
 
Search WWH ::




Custom Search