Profiling Server Performance - High Performance MySQL

Databases Reference

In-Depth Information

problem based on intuition and experience. But we also saw something that didn't

make sense. If you look at the iostat output again, in the wsec/s column you can see

that for about six seconds, the server is writing hundreds of megabytes of data per

second to the disks. Each sector is 512 bytes, so those samples show up to 150 MB of

writes per second at times. Yet the entire database is only 900 MB, and the workload

is mostly SELECT queries. How can this happen?

When you examine a system, try to ask yourself whether there are any things like this

that simply don't add up, and investigate them further. Try to follow each train of

thought to its conclusion, and try not to get sidetracked on too many tangents, or you

could forget about a promising possibility. Write down little notes and cross them off

to help ensure that you've dotted all the Ts. 18

At this point, we could jump right to a conclusion, and it would be wrong. We see from

the main thread state that InnoDB is trying to flush dirty pages, which generally doesn't

appear in the status output unless flushing is delayed. We know that this version of

InnoDB is prone to the “furious flushing” problem, also called a checkpoint stall . This

is what happens when InnoDB doesn't spread flushing out evenly over time, and it

suddenly decides to force a checkpoint (flush a lot of data) to make up for that. This

can cause serious blocking inside InnoDB, making everything queue and wait to enter

the kernel, and thus pile up at the layers above InnoDB in the server. We showed an

example in Chapter 2 of the periodic drops in performance that can happen when there

is furious flushing. Many of this server's symptoms are similar to what happens during

a forced checkpoint, but it's not the problem in this case. You can prove that in many

ways, perhaps most easily by looking at the SHOW STATUS counters and tracking the

change in the Innodb_buffer_pool_pages_flushed counter, which, as we mentioned

earlier, was not increasing much. In addition, we noted that the buffer pool doesn't

have much dirty data to flush anyway—certainly not hundreds of megabytes. This is

not surprising, because the workload on this server is almost entirely SELECT queries.

We can therefore conclude that instead of blaming the problem on InnoDB

flushing, we should blame InnoDB's flushing delay on the problem. It is a symptom—

an effect—not a cause. The underlying problem is causing the disks to become satu-

rated that InnoDB isn't having any luck getting its I/O tasks done. So we can eliminate

this as a possible cause, and cross off one of our intuition-based ideas.

Distinguishing cause from effect can be hard sometimes, and it can be tempting to just

skip the investigation and jump to the diagnosis when a problem looks familiar. It is

good to avoid taking shortcuts, but it's equally important to pay attention to your

intuition. If something looks familiar, it is prudent to spend a little time measuring the

necessary and sufficient conditions to prove whether that's the problem. This can save

a lot of time you'd otherwise spend looking through other data about the system and

its performance. Just try not to jump to conclusions based on a gut feeling that “I've

18. Or whatever that phrase is. Put all your eggs in one haystack?

Search WWH ::

Custom Search

Home