Scaling MySQL - High Performance MySQL

Databases Reference

In-Depth Information

• Use a single database per shard, and include the shard number in both the database

and table names (e.g., the table name would become bookclub_23.comments_23 ).

• Run multiple MySQL instances per node, each with one or more shards, arranged

in any sensible combination of the ways we've just mentioned.

If you include the shard number in the table name, you'll need some way to insert the

shard number into templated queries. Typical practices include special “magic” place-

holder values in queries, sprintf() -style formatting specifications such as %s , and string

interpolation with variables. Here is one way you can create templated queries in PHP:

$sql = "SELECT book_id, book_title FROM bookclub_%d.comments_%d... ";

$res = mysql_query(sprintf($sql, $shardno, $shardno), $conn);

You could also just use string interpolation:

$sql = "SELECT book_id, book_title FROM bookclub_$shardno.comments_$shardno ...";

$res = mysql_query($sql, $conn);

This is easy to build into a new application, but it might be harder for existing appli-

cations. When we're building new applications and query templating isn't an issue, we

like to use a single database per shard, with the shard number in both the database and

the table name. It adds some complexity for jobs such as scripting ALTER TABLE , but it

has advantages, too:

• You can move a shard easily if it's completely contained in a single database.

• Because a database is a directory in the filesystem, you can manage a shard's files

easily.

• It's easy to find out how large the shard is if it isn't mixed up with other shards.

• The globally unique table names help avoid mistakes. If table names are the same

everywhere, it's easy to accidentally query the wrong shard because you connected

to the wrong node, or import one shard's data into another shard's tables.

You might want to consider whether your application's data has any shard affinity . You

might benefit from placing certain shards “near” each other (on the same server, on the

same subnet, in the same data center, or on the same switch) to exploit some similarity

in the data access patterns. For example, you can shard by user and then place users

from the same country into shards on the same nodes.

Adding sharding support to an existing application often results in one shard per

node. This simplification helps limit how much you need to change the application's

queries. Sharding is usually a pretty disruptive change for an application, so it makes

sense to simplify where possible. If you shard so each node looks like a miniature copy

of the whole application's data, you might not have to change most of the queries or

worry about routing queries to the desired node.

Search WWH ::

Custom Search

Home