Databases Reference
In-Depth Information
Keeping active data separate
Even if you don't actually move stale data away to another server, many applications
can benefit from separating active and inactive datasets. This helps keep caches effi-
cient, and enables you to use different kinds of hardware or application architectures
for the active and inactive data. Here are some ways to accomplish this:
Splitting tables into parts
It's often smart to split tables, especially if the entire table won't fit in memory. For
example, you can split the users table into active_users and inactive_users . You
might think this isn't necessary because the database will cache only the “hot” data
anyway, but that depends on your storage engine. If you use InnoDB, caching
works a page at a time. If you can fit 100 users on a page and only 10% of your
users are active, that probably makes every page “hot” from InnoDB's point of
view—yet 90% of each “hot” page will be wasted space. Splitting the table in two
could improve your memory usage dramatically.
MySQL partitioning
MySQL 5.1 offers natively partitioned tables, which can help keep the most recent
data in memory. See Chapter 7 for more about partitioning.
Time-based data partitioning
If your application continually gets new data, it's likely that the newest data will
be far more active than the older data. For example, we know of one blog service
whose traffic is mostly from posts and comments created in the last seven days.
Most of its updates are to the same set of data. As a result, this data is kept entirely
in memory, with replication to ensure there is a recoverable copy on disk if there's
a failure. The rest of the data lives forever in another location.
We've also seen designs that store each user's data in shards on two nodes. New
data goes to the “active” node, which has a lot of memory and fast disks. This data
is optimized for very fast access. The other node stores older data, with very large
(but slower) disks. The application assumes that it's not likely to need the older
data. This is a good assumption for a lot of applications, which might be able to
satisfy 90% or more of requests from only the most recent 10% of the data.
You can implement this sharding policy easily with dynamic sharding. For exam-
ple, your sharding directory's table definition might look something like the
following:
CREATE TABLE users (
user_id int unsigned not null,
shard_new int unsigned not null,
shard_archive int unsigned not null,
archive_timestamp timestamp,
PRIMARY KEY (user_id)
);
An archive script can move older data from the active node to the archive node,
updating the archive_timestamp column when it moves a user's data to the archive
 
Search WWH ::




Custom Search