Application-Level Optimization - High Performance MySQL

Databases Reference

In-Depth Information

expensive to generate the cached data (which the writer process might already

have). If you update the cached data, future requests won't have to wait for the

application to generate it. If you do invalidations in the background, such as TTL-

based invalidations, you can generate new versions of the invalidated data in a

process that's completely detached from any user request.

Invalidation on read

Instead of invalidating stale data when you change the source data from which it's

derived, you can store some information that lets you determine whether the data

has expired when you read it from the cache. This has a significant advantage over

explicit invalidation: it has a fixed cost that you can spread out over time. Suppose

you invalidate an object upon which a million cached objects depend. If you in-

validate on write, you have to invalidate a million things in the cache in one hit,

which could take a long time even if you have an efficient way to find them. If you

invalidate on read, the write can complete immediately, and each of a million reads

will be delayed slightly. This spreads out the cost of the invalidation and helps

avoid spikes of load and long latencies.

One of the simplest ways to do invalidation on read is with object versioning . With this

approach, when you store an object in the cache, you also store the current version

number or timestamp of the data upon which it depends. For example, suppose you're

caching statistics about a user's blog posts, including the number of posts the user has

made. When you cache the blog_stats object, you store the user's current version

number with it, because the statistics are dependent on the user.

Whenever you update some data that also depends on the user, you update the user's

version number. Suppose the user's version is initially 0, and you generate and cache

the statistics. When the user publishes a blog post, you increase the user's version to

1 (you'd store this with the blog post too, though we don't really need it for this ex-

ample). Then, when you need to display the statistics, you compare the cached

blog_stats object's version to the cached user's version. Because the user's version is

greater than the object's version, you know that the statistics are stale and you need to

recompute them.

This is a pretty coarse way to invalidate content, because it assumes that every bit of

data that's dependent on the user also interacts with all other data. That's not always

true. If a user edits a blog post, for example, you'll increment the user's version, and

that will invalidate the stored statistics even though the statistics (the number of blog

posts) didn't really change. The trade-off is simplicity. A simple cache invalidation

policy isn't just easier to build; it might be more efficient, too.

Object versioning is a simplified approach to a tagged cache , which can handle more

complex dependencies. A tagged cache knows about different kinds of dependencies

and tracks versions separately for each of them. To return to the book club example

from Chapter 11 , you could make the cached comments dependent on the user's ver-

sion and the book's version by tagging the comments with these version numbers:

Search WWH ::

Custom Search

Home