Databases Reference
In-Depth Information
Figure 2.5 To get a feel for how expensive it is to access your hard drive compared to
finding an item in RAM cache, think of how long it might take you to pick up an item in
your back yard (RAM). Then think of how long it would take to drive to a location in your
neighborhood (SSD), and finally think of how long it would take to pick up an item in Los
Angeles if you lived in Chicago (HDD). This shows that finding a query result in a local
cache is more efficient than an expensive query that needs HDD access.
You can do a few trillion calculations while you're waiting for your data to get back
from LA . That's why calculating a hash is much faster than going to disk, and the more
RAM you have, the lower the probability you need to make that long round trip.
The solution to faster systems is to keep as much of the right information in RAM
as you can, and check your local servers to see if they might also have a copy. This local
fast data store is often called a RAM cache or memory cache . Yet, accomplishing this and
determining when the data is no longer current turn out to be difficult questions.
Many memory caches use a simple timestamp for each block of memory in the
cache as a way of keeping the most recently used objects in memory. When memory
fills up, the timestamp is used to determine which items in memory are the oldest and
should be overwritten. A more refined view can take into account how much time or
resources it'll take to re-create the dataset and store it in memory. This “cost model”
allows more expensive queries to be kept in RAM longer than similar items that could
be regenerated much faster.
The effective use of RAM cache is predicated on the efficient answer to the ques-
tion, “Have we run this query before?” or equivalently, “Have we seen this document
before?” These questions can be answered by using consistent hashing , which lets you
know if an item is already in the cache or if you need to retrieve it from SSD or HDD .
2.4
Using consistent hashing to keep your cache current
You've learned how important it is to keep frequently used data in your RAM cache,
and how by avoiding unnecessary disk access you can improve your database perfor-
mance. NoSQL systems expand on this concept and use a technique called consistent
hashing to keep the most frequently used data in your cache.
Consistent hashing is a general-purpose process that's useful when evaluating how
NoSQL systems work. Consistent hashing quickly tells you if a new query or
Search WWH ::




Custom Search