Databases Reference
In-Depth Information
You can imagine the database as a desk with filing drawers. The working set consists
of the papers you need to have on the desktop to get your work done. The desktop is
main memory in this analogy, while the filing drawers are the hard disks.
Just as you don't need to have every piece of paper on the desktop to get your work
done, you don't need the whole database to fit in memory for optimal performance—
just the working set.
The working set's size varies greatly depending on the application. For some applica-
tions the working set might be 1% of the total data size, while for others it could be
close to 100%. When the working set doesn't fit in memory, the database server will
have to shuffle data between the disk and memory to get its work done. This is why a
memory shortage might look like an I/O problem. Sometimes there's no way you can
fit your entire working set in memory, and sometimes you don't actually want to (for
example, if your application needs a lot of sequential I/O). Your application architec-
ture can change a lot depending on whether you can fit the working set in memory.
The working set can be defined as a time-based percentile. For example, the 95th per-
centile one-hour working set is the set of pages that the database uses during one hour,
except for the 5% of pages that are least frequently used. A percentile is the most useful
way to think about this, because you might need to access only 1% of your data every
hour, but over a 24-hour period that might add up to around 20% of the distinct pages
in the whole database. It might be more intuitive to think of the working set in terms
of how much data you need to have cached, so your workload is mostly CPU-bound.
If you can't cache enough data, your working set doesn't fit in memory.
You should think about the working set in terms of the most frequently used set of
pages, not the most frequently read or written set of pages. This means that determining
the working set requires instrumentation inside the application; you cannot just look
at external usage such as I/O accesses, because I/O to the pages is not the same thing
as logical access to the pages. MySQL might read a page into memory and then access
it millions of times, but you'll see only one I/O operation if you're looking at strace , for
example. The lack of instrumentation needed for determining the working set is prob-
ably the biggest reason that there isn't a lot of research into this topic.
The working set consists of both data and indexes, and you should count it in cache
units . A cache unit is the smallest unit of data that the storage engine works with.
The size of the cache unit varies between storage engines, and therefore so does the size
of the working set. For example, InnoDB works in pages of 16 KB by default. If you do
a single-row lookup and InnoDB has to go to disk to get it, it'll read the entire page
containing that row into the buffer pool and cache it there. This can be wasteful. Sup-
pose you have 100-byte rows that you access randomly. InnoDB will use a lot of extra
memory in the buffer pool for these rows, because it will have to read and cache a
complete 16 KB page for each row. Because the working set includes indexes too,
InnoDB will also read and cache the parts of the index tree it needed to find the row.
InnoDB's index pages are also 16 KB in size, which means it might have to store a total
 
Search WWH ::




Custom Search