Databases Reference
In-Depth Information
These special-purpose cache requirements are why you'll have to balance your cache
hierarchy to suit the particular access patterns of a database server. Because the registers
and on-chip caches are not user-configurable, memory and the storage are the only
things you can change.
Random Versus Sequential I/O
Database servers use both sequential and random I/O, and random I/O benefits the
most from caching. You can convince yourself of this by thinking about a typical mixed
workload, with some balance of single-row lookups and multirow range scans. The
typical pattern is for the “hot” data to be randomly distributed; caching this data will
therefore help avoid expensive disk seeks. In contrast, sequential reads generally go
through the data only once, so it's useless to cache it unless it fits completely in memory.
Another reason sequential reads don't benefit much from caching is because they are
faster than random reads. There are two reasons for this:
Sequential I/O is faster than random I/O.
Sequential operations are performed faster than random operations, both in mem-
ory and on disk. Suppose your disks can do 100 random I/O operations per second
and can read 50 MB per second sequentially (that's roughly what a consumer-grade
disk can achieve today). If you have 100-byte rows, that's 100 rows per second
randomly, versus 500,000 rows per second sequentially—a difference of 5,000
times, or several orders of magnitude. Thus, the random I/O benefits more from
caching in this scenario.
Accessing in-memory rows sequentially is also faster than accessing in-memory
rows randomly. Today's memory chips can typically access about 250,000 100-
byte rows per second randomly, and 5 million per second sequentially. Note that
random accesses are some 2,500 times faster in memory than on disk, while se-
quential accesses are only 10 times faster in memory.
Storage engines can perform sequential reads faster than random reads.
A random read generally means that the storage engine must perform index oper-
ations. (There are exceptions to this rule, but it's true for InnoDB and MyISAM.)
That usually requires navigating a B-Tree data structure and comparing values to
other values. In contrast, sequential reads generally require traversing a simpler
data structure, such as a linked list. That's a lot less work, so again, sequential reads
are faster.
Finally, random reads are typically executed to find individual rows, but the read isn't
just one row—it is a whole page of data, most of which isn't needed. That's a lot of
wasted work. A sequential read, on the other hand, typically happens when you want
all of the rows on the page, so it's much more cost-effective.
 
Search WWH ::




Custom Search