Scaling MySQL - High Performance MySQL

Databases Reference

In-Depth Information

Keeping active data separate

Even if you don't actually move stale data away to another server, many applications

can benefit from separating active and inactive datasets. This helps keep caches effi-

cient, and enables you to use different kinds of hardware or application architectures

for the active and inactive data. Here are some ways to accomplish this:

Splitting tables into parts

It's often smart to split tables, especially if the entire table won't fit in memory. For

example, you can split the users table into active_users and inactive_users . You

might think this isn't necessary because the database will cache only the “hot” data

anyway, but that depends on your storage engine. If you use InnoDB, caching

works a page at a time. If you can fit 100 users on a page and only 10% of your

users are active, that probably makes every page “hot” from InnoDB's point of

view—yet 90% of each “hot” page will be wasted space. Splitting the table in two

could improve your memory usage dramatically.

MySQL partitioning

MySQL 5.1 offers natively partitioned tables, which can help keep the most recent

data in memory. See Chapter 7 for more about partitioning.

Time-based data partitioning

If your application continually gets new data, it's likely that the newest data will

be far more active than the older data. For example, we know of one blog service

whose traffic is mostly from posts and comments created in the last seven days.

Most of its updates are to the same set of data. As a result, this data is kept entirely

in memory, with replication to ensure there is a recoverable copy on disk if there's

a failure. The rest of the data lives forever in another location.

We've also seen designs that store each user's data in shards on two nodes. New

data goes to the “active” node, which has a lot of memory and fast disks. This data

is optimized for very fast access. The other node stores older data, with very large

(but slower) disks. The application assumes that it's not likely to need the older

data. This is a good assumption for a lot of applications, which might be able to

satisfy 90% or more of requests from only the most recent 10% of the data.

You can implement this sharding policy easily with dynamic sharding. For exam-

ple, your sharding directory's table definition might look something like the

following:

CREATE TABLE users (

user_id int unsigned not null,

shard_new int unsigned not null,

shard_archive int unsigned not null,

archive_timestamp timestamp,

PRIMARY KEY (user_id)

);

An archive script can move older data from the active node to the archive node,

updating the archive_timestamp column when it moves a user's data to the archive

Search WWH ::

Custom Search

Home