Databases Reference
In-Depth Information
problem based on intuition and experience. But we also saw something that didn't
make sense. If you look at the iostat output again, in the wsec/s column you can see
that for about six seconds, the server is writing hundreds of megabytes of data per
second to the disks. Each sector is 512 bytes, so those samples show up to 150 MB of
writes per second at times. Yet the entire database is only 900 MB, and the workload
is mostly SELECT queries. How can this happen?
When you examine a system, try to ask yourself whether there are any things like this
that simply don't add up, and investigate them further. Try to follow each train of
thought to its conclusion, and try not to get sidetracked on too many tangents, or you
could forget about a promising possibility. Write down little notes and cross them off
to help ensure that you've dotted all the Ts. 18
At this point, we could jump right to a conclusion, and it would be wrong. We see from
the main thread state that InnoDB is trying to flush dirty pages, which generally doesn't
appear in the status output unless flushing is delayed. We know that this version of
InnoDB is prone to the “furious flushing” problem, also called a checkpoint stall . This
is what happens when InnoDB doesn't spread flushing out evenly over time, and it
suddenly decides to force a checkpoint (flush a lot of data) to make up for that. This
can cause serious blocking inside InnoDB, making everything queue and wait to enter
the kernel, and thus pile up at the layers above InnoDB in the server. We showed an
example in Chapter 2 of the periodic drops in performance that can happen when there
is furious flushing. Many of this server's symptoms are similar to what happens during
a forced checkpoint, but it's not the problem in this case. You can prove that in many
ways, perhaps most easily by looking at the SHOW STATUS counters and tracking the
change in the Innodb_buffer_pool_pages_flushed counter, which, as we mentioned
earlier, was not increasing much. In addition, we noted that the buffer pool doesn't
have much dirty data to flush anyway—certainly not hundreds of megabytes. This is
not surprising, because the workload on this server is almost entirely SELECT queries.
We can therefore conclude that instead of blaming the problem on InnoDB
flushing, we should blame InnoDB's flushing delay on the problem. It is a symptom—
an effect—not a cause. The underlying problem is causing the disks to become satu-
rated that InnoDB isn't having any luck getting its I/O tasks done. So we can eliminate
this as a possible cause, and cross off one of our intuition-based ideas.
Distinguishing cause from effect can be hard sometimes, and it can be tempting to just
skip the investigation and jump to the diagnosis when a problem looks familiar. It is
good to avoid taking shortcuts, but it's equally important to pay attention to your
intuition. If something looks familiar, it is prudent to spend a little time measuring the
necessary and sufficient conditions to prove whether that's the problem. This can save
a lot of time you'd otherwise spend looking through other data about the system and
its performance. Just try not to jump to conclusions based on a gut feeling that “I've
18. Or whatever that phrase is. Put all your eggs in one haystack?
 
Search WWH ::




Custom Search