Databases Reference
In-Depth Information
numbers, especially if you move them into different roles as you add more servers,
or when you recover from failures.
Create a table in the global node
You can create a table with an AUTO_INCREMENT column in your global database
node, and applications can use this to generate unique numbers.
Use memcached
There's an incr() function in the memcached API that can increment a number
atomically and return the result. You can use Redis, too.
Allocate numbers in batches
The application can request a batch of numbers from a global node, use all the
numbers, and then request more.
Use a combination of values
You can use a combination of values, such as the shard ID and an incrementing
number, to make each server's values unique. See the discussion of this technique
in the previous section.
Use GUID values
You can generate globally unique values with the UUID() function. Beware, though:
this function does not replicate correctly with statement-based replication, al-
though it works fine if your application selects the value into its own memory and
then uses it as a literal in statements. GUID values are large and nonsequential, so
they don't make good primary keys for InnoDB tables. See “Inserting rows in pri-
mary key order with InnoDB” on page 173 for more on this. There's also a
UUID_SHORT() function in MySQL 5.1 and newer versions, which has some nice
properties such as being sequential and only 64 bits instead of 128.
If you use a global allocator to generate values, be careful that the single point of con-
tention doesn't create a bottleneck for your application.
Although the memcached approach can be very fast (tens of thousands of values per
second), it isn't persistent. Each time you restart the memcached service, you'll need to
initialize the value in the cache. This could require you to find the maximum value
that's in use across all shards, which might be very slow and difficult to do atomically.
Tools for sharding
One of the first things you'll have to do when designing a sharded application is write
code for querying multiple data sources.
It's generally a poor design to expose the multiple data sources to the application
without any abstraction, because it can add a lot of code complexity. It's better to hide
the data sources behind an abstraction layer. This layer might handle the following
tasks:
• Connecting to the correct shard and querying it
• Distributed consistency checks
 
Search WWH ::




Custom Search