Databases Reference
In-Depth Information
On-disk caches
Disks are slow, so caching on disk is best for persistent objects, objects that are
hard to fit in memory, or static content (pregenerated custom images, for example).
One very useful trick with on-disk caches and web servers is to use 404 error han-
dlers to catch cache misses. Suppose your web application shows a custom-
generated image in the header, based on the user's name (“Welcome back, John!”).
You can refer to the image as /images/welcomeback/john.jpg . If the image doesn't
exist, it will cause a 404 error and trigger the error handler. The error handler can
generate the image, store it on the disk, and either issue a redirect or just stream
the image back to the browser. Further requests will just return the image from the
file.
You can use this trick for many types of content. For example, instead of caching
the latest headlines as a block of HTML, you can store them in a JavaScript file and
then refer to /latest_headlines.js in the web page's header.
Cache invalidation is easy: just delete the file. You can implement TTL invalidation
by running a periodic job that deletes files created more than N minutes ago. And
if you want to limit the cache size, you can implement a least recently used (LRU)
invalidation policy by deleting files in order of their last access time.
Invalidation based on last access time requires you to enable the access time option
in your filesystem's mount options. (You actually do this by omitting the noatime
mount option.) If you do this, you should use an in-memory filesystem to avoid a
lot of disk activity.
Cache Control Policies
Caches create the same problem as denormalizing your database design: they duplicate
data, which means there are multiple places to update the data, and you have to figure
out how to avoid reading stale data. The following are several of the most common
cache control policies:
TTL (time to live)
The cached object is stored with an expiration date; you can either remove the
object with a purge process when that date arrives, or leave it until the next time
something accesses it (at which time you should replace it with a fresh version).
This invalidation policy is best for data that changes rarely or doesn't have to be
fresh.
Explicit invalidation
If stale data is not acceptable, the process that updates the source data can inva-
lidate the old version in the cache. There are two variations of this policy: write-
invalidate and write-update . The write-invalidate policy is simple: you just mark
the cached data as expired (and optionally purge it from the cache). The write-
update policy involves a little more work, because you have to replace the old cache
entry with the updated data. However, it can be very beneficial, especially if it is
 
Search WWH ::




Custom Search