Operating System and Hardware Optimization - High Performance MySQL

Databases Reference

In-Depth Information

These special-purpose cache requirements are why you'll have to balance your cache

hierarchy to suit the particular access patterns of a database server. Because the registers

and on-chip caches are not user-configurable, memory and the storage are the only

things you can change.

Random Versus Sequential I/O

Database servers use both sequential and random I/O, and random I/O benefits the

most from caching. You can convince yourself of this by thinking about a typical mixed

workload, with some balance of single-row lookups and multirow range scans. The

typical pattern is for the “hot” data to be randomly distributed; caching this data will

therefore help avoid expensive disk seeks. In contrast, sequential reads generally go

through the data only once, so it's useless to cache it unless it fits completely in memory.

Another reason sequential reads don't benefit much from caching is because they are

faster than random reads. There are two reasons for this:

Sequential I/O is faster than random I/O.

Sequential operations are performed faster than random operations, both in mem-

ory and on disk. Suppose your disks can do 100 random I/O operations per second

and can read 50 MB per second sequentially (that's roughly what a consumer-grade

disk can achieve today). If you have 100-byte rows, that's 100 rows per second

randomly, versus 500,000 rows per second sequentially—a difference of 5,000

times, or several orders of magnitude. Thus, the random I/O benefits more from

caching in this scenario.

Accessing in-memory rows sequentially is also faster than accessing in-memory

rows randomly. Today's memory chips can typically access about 250,000 100-

byte rows per second randomly, and 5 million per second sequentially. Note that

random accesses are some 2,500 times faster in memory than on disk, while se-

quential accesses are only 10 times faster in memory.

Storage engines can perform sequential reads faster than random reads.

A random read generally means that the storage engine must perform index oper-

ations. (There are exceptions to this rule, but it's true for InnoDB and MyISAM.)

That usually requires navigating a B-Tree data structure and comparing values to

other values. In contrast, sequential reads generally require traversing a simpler

data structure, such as a linked list. That's a lot less work, so again, sequential reads

are faster.

Finally, random reads are typically executed to find individual rows, but the read isn't

just one row—it is a whole page of data, most of which isn't needed. That's a lot of

wasted work. A sequential read, on the other hand, typically happens when you want

all of the rows on the page, so it's much more cost-effective.

Search WWH ::

Custom Search

Home