Databases Reference
In-Depth Information
The doublewrite buffer
InnoDB uses a doublewrite buffer to avoid data corruption in case of partial page writes.
A partial page write occurs when a disk write doesn't complete fully, and only a portion
of a 16 KB page is written to disk. There are a variety of reasons (crashes, bugs, and so
on) that a page might be partially written to disk. The doublewrite buffer guards against
data corruption if this happens.
The doublewrite buffer is a special reserved area of the tablespace, large enough to hold
100 pages in a contiguous block. It is essentially a backup copy of recently written
pages. When InnoDB flushes pages from the buffer pool to the disk, it writes (and
flushes) them first to the doublewrite buffer, then to the main data area where they
really belong. This ensures that every page write is atomic and durable.
Doesn't this mean that every page is written twice? Yes, it does, but because InnoDB
writes several pages to the doublewrite buffer sequentially and only then calls
fsync() to sync them to disk, the performance impact is relatively small—generally a
few percentage points, not double, although the overhead is more noticeable on solid-
state drives, as we'll discuss in the next chapter. More importantly, this strategy allows
the log files to be much more efficient. Because the doublewrite buffer gives InnoDB a
very strong guarantee that the data pages are not corrupt, InnoDB's log records don't
have to contain full pages; they are more like binary deltas to pages.
If there's a partial page write to the doublewrite buffer itself, the original page will still
be on disk in its real location. When InnoDB recovers, it will use the original page
instead of the corrupted copy in the doublewrite buffer. However, if the doublewrite
buffer succeeds and the write to the page's real location fails, InnoDB will use the copy
in the doublewrite buffer during recovery. InnoDB knows when a page is corrupt be-
cause each page has a checksum at the end; the checksum is the last thing to be written,
so if the page's contents don't match the checksum, the page is corrupt. Upon recovery,
therefore, InnoDB just reads each page in the doublewrite buffer and verifies the check-
sums. If a page's checksum is incorrect, it reads the page from its original location.
In some cases, the doublewrite buffer really isn't necessary—for example, you might
want to disable it on replicas. Also, some filesystems (such as ZFS) do the same thing
themselves, so it is redundant for InnoDB to do it. You can disable the doublewrite
buffer by setting innodb_doublewrite to 0 . In Percona Server, you can configure the
doublewrite buffer to be stored in its own file, so you can separate this workload from
the rest of the server's work by placing it on separate disk drives.
Other I/O configuration options
The sync_binlog option controls how MySQL flushes the binary log to disk. Its default
value is 0 , which means MySQL does no flushing and it's up to the operating system
to decide when to flush its cache to durable storage. If the value is greater than 0 , it
specifies how many binary log writes happen between flushes to disk (each write is a
 
Search WWH ::




Custom Search