Operating System and Hardware Optimization - High Performance MySQL

Databases Reference

In-Depth Information

You can imagine the database as a desk with filing drawers. The working set consists

of the papers you need to have on the desktop to get your work done. The desktop is

main memory in this analogy, while the filing drawers are the hard disks.

Just as you don't need to have every piece of paper on the desktop to get your work

done, you don't need the whole database to fit in memory for optimal performance—

just the working set.

The working set's size varies greatly depending on the application. For some applica-

tions the working set might be 1% of the total data size, while for others it could be

close to 100%. When the working set doesn't fit in memory, the database server will

have to shuffle data between the disk and memory to get its work done. This is why a

memory shortage might look like an I/O problem. Sometimes there's no way you can

fit your entire working set in memory, and sometimes you don't actually want to (for

example, if your application needs a lot of sequential I/O). Your application architec-

ture can change a lot depending on whether you can fit the working set in memory.

The working set can be defined as a time-based percentile. For example, the 95th per-

centile one-hour working set is the set of pages that the database uses during one hour,

except for the 5% of pages that are least frequently used. A percentile is the most useful

way to think about this, because you might need to access only 1% of your data every

hour, but over a 24-hour period that might add up to around 20% of the distinct pages

in the whole database. It might be more intuitive to think of the working set in terms

of how much data you need to have cached, so your workload is mostly CPU-bound.

If you can't cache enough data, your working set doesn't fit in memory.

You should think about the working set in terms of the most frequently used set of

pages, not the most frequently read or written set of pages. This means that determining

the working set requires instrumentation inside the application; you cannot just look

at external usage such as I/O accesses, because I/O to the pages is not the same thing

as logical access to the pages. MySQL might read a page into memory and then access

it millions of times, but you'll see only one I/O operation if you're looking at strace , for

example. The lack of instrumentation needed for determining the working set is prob-

ably the biggest reason that there isn't a lot of research into this topic.

The working set consists of both data and indexes, and you should count it in cache

units . A cache unit is the smallest unit of data that the storage engine works with.

The size of the cache unit varies between storage engines, and therefore so does the size

of the working set. For example, InnoDB works in pages of 16 KB by default. If you do

a single-row lookup and InnoDB has to go to disk to get it, it'll read the entire page

containing that row into the buffer pool and cache it there. This can be wasteful. Sup-

pose you have 100-byte rows that you access randomly. InnoDB will use a lot of extra

memory in the buffer pool for these rows, because it will have to read and cache a

complete 16 KB page for each row. Because the working set includes indexes too,

InnoDB will also read and cache the parts of the index tree it needed to find the row.

InnoDB's index pages are also 16 KB in size, which means it might have to store a total

Search WWH ::

Custom Search

Home