Profiling Server Performance - High Performance MySQL

Databases Reference

In-Depth Information

bad queries all at once saturating the disk. The most common cause of this we've seen

(this is our intuition at work) is a cache stampede, when cached items expire all at once

from memcached and many instances of the application try to repopulate the cache

simultaneously. We showed samples of the queries to the developers and discussed

their purpose. Indeed, it turned out that simultaneous cache expiration was the cause

(confirming our intuition). In addition to the developers addressing the problem at the

application level, we were able to help them modify the queries so they didn't use

temporary tables on disk. Either one of these fixes might have prevented the problem,

but it was much better to do both than just one.

Now, we'd like to apply a little hindsight to explain some questions you might have

had as we went along (we certainly critiqued our own approach as we reviewed it for

this chapter):

Why didn't we just optimize the slow queries to begin with?

Because the problem wasn't slow queries, it was “too many connections” errors.

Sure, it's logical to see that long-running queries cause things to stack up and the

connection count to climb. But so can dozens of other things. In the absence of

finding a good reason for why things are going wrong, it's all too tempting to fall

back to looking for slow queries or other general things that look like they could

be improved. 20 But this goes badly more often than it goes well. If you took your

car to the mechanic and complained about an unfamiliar noise, and then got slap-

ped with a bill for balancing the tires and changing the transmission fluid because

the mechanic couldn't figure out the real problem and went looking for other things

to do, wouldn't you be annoyed?

But isn't it a red flag that the queries were running slowly with a bad EXPLAIN ?

They were indeed—during the incidents. Was that a cause or an effect? It wasn't

obvious until we dug into things more deeply. And remember, the queries seemed

to be running well enough in normal circumstances. Just because a query does a

filesort with a temporary table doesn't mean it is a problem. Getting rid of filesorts

and temporary tables is a catch-all, “best practice” type of tactic.

Generic “best practices” reviews have their place, but they are seldom the solution

to highly specific problems. The problem could easily have been misconfiguration,

for example. We've seen many cases where someone tried to fix a misconfigured

server with tactics such as optimizing queries, which was ultimately a waste of time

and just prolonged the damage caused by the real problem.

If cached items were being regenerated many times, wouldn't there be multiple identical

queries?

Yes, and this is something we did not investigate at the time. Multiple threads

regenerating the same cached item would indeed cause many completely identical

queries. (This is different from having multiple queries of the same general type,

20. Also known as the “when all you have is a hammer, everything looks like a nail” approach.

Search WWH ::

Custom Search

Home