Databases Reference
In-Depth Information
As a result, you can save work by caching sequential reads, but you can save much more
work by caching random reads instead. In other words, adding memory is the best sol-
ution for random-read I/O problems if you can afford it.
Caching, Reads, and Writes
If you have enough memory, you can insulate the disk from read requests completely.
If all your data fits in memory, every read will be a cache hit once the server's caches
are warmed up. There will still be logical reads , but no physical reads . Writes are a
different matter, though. A write can be performed in memory just as a read can, but
sooner or later it has to be written to the disk so it's permanent. In other words, a cache
can delay writes, but caching cannot eliminate writes as it can reads.
In fact, in addition to allowing writes to be delayed, caching can permit them to be
grouped together in two important ways:
Many writes, one flush
A single piece of data can be changed many times in memory without all of the
new values being written to disk. When the data is eventually flushed to disk, all
the modifications that happened since the last physical write are made permanent.
For example, many statements could update an in-memory counter. If the counter
is incremented 100 times and then written to disk, 100 modifications have been
grouped into one write.
I/O merging
Many different pieces of data can be modified in memory and the modifications
can be collected together, so the physical writes can be performed as a single disk
operation.
This is why many transactional systems use a write-ahead logging strategy. Write-ahead
logging lets them make changes to the pages in memory without flushing the changes
to disk, which usually involves random I/O and is very slow. Instead, they write a record
of the changes to a sequential log file, which is much faster. A background thread can
flush the modified pages to disk later; when it does, it can optimize the writes.
Writes benefit greatly from buffering, because it converts random I/O into more se-
quential I/O. Asynchronous (buffered) writes are typically handled by the operating
system and are batched so they can be flushed to disk more optimally. Synchronous
(unbuffered) writes have to be written to disk before they finish. That's why they benefit
from buffering in a RAID controller's battery-backed write-back cache (we discuss
RAID a bit later).
What's Your Working Set?
Every application has a “working set” of data—that is, the data that it really needs to
do its work. A lot of databases also have plenty of data that's not in the working set.
 
Search WWH ::




Custom Search