Databases Reference
In-Depth Information
expensive to generate the cached data (which the writer process might already
have). If you update the cached data, future requests won't have to wait for the
application to generate it. If you do invalidations in the background, such as TTL-
based invalidations, you can generate new versions of the invalidated data in a
process that's completely detached from any user request.
Invalidation on read
Instead of invalidating stale data when you change the source data from which it's
derived, you can store some information that lets you determine whether the data
has expired when you read it from the cache. This has a significant advantage over
explicit invalidation: it has a fixed cost that you can spread out over time. Suppose
you invalidate an object upon which a million cached objects depend. If you in-
validate on write, you have to invalidate a million things in the cache in one hit,
which could take a long time even if you have an efficient way to find them. If you
invalidate on read, the write can complete immediately, and each of a million reads
will be delayed slightly. This spreads out the cost of the invalidation and helps
avoid spikes of load and long latencies.
One of the simplest ways to do invalidation on read is with object versioning . With this
approach, when you store an object in the cache, you also store the current version
number or timestamp of the data upon which it depends. For example, suppose you're
caching statistics about a user's blog posts, including the number of posts the user has
made. When you cache the blog_stats object, you store the user's current version
number with it, because the statistics are dependent on the user.
Whenever you update some data that also depends on the user, you update the user's
version number. Suppose the user's version is initially 0, and you generate and cache
the statistics. When the user publishes a blog post, you increase the user's version to
1 (you'd store this with the blog post too, though we don't really need it for this ex-
ample). Then, when you need to display the statistics, you compare the cached
blog_stats object's version to the cached user's version. Because the user's version is
greater than the object's version, you know that the statistics are stale and you need to
recompute them.
This is a pretty coarse way to invalidate content, because it assumes that every bit of
data that's dependent on the user also interacts with all other data. That's not always
true. If a user edits a blog post, for example, you'll increment the user's version, and
that will invalidate the stored statistics even though the statistics (the number of blog
posts) didn't really change. The trade-off is simplicity. A simple cache invalidation
policy isn't just easier to build; it might be more efficient, too.
Object versioning is a simplified approach to a tagged cache , which can handle more
complex dependencies. A tagged cache knows about different kinds of dependencies
and tracks versions separately for each of them. To return to the book club example
from Chapter 11 , you could make the cached comments dependent on the user's ver-
sion and the book's version by tagging the comments with these version numbers:
 
Search WWH ::




Custom Search