Databases Reference
In-Depth Information
bad queries all at once saturating the disk. The most common cause of this we've seen
(this is our intuition at work) is a cache stampede, when cached items expire all at once
from memcached and many instances of the application try to repopulate the cache
simultaneously. We showed samples of the queries to the developers and discussed
their purpose. Indeed, it turned out that simultaneous cache expiration was the cause
(confirming our intuition). In addition to the developers addressing the problem at the
application level, we were able to help them modify the queries so they didn't use
temporary tables on disk. Either one of these fixes might have prevented the problem,
but it was much better to do both than just one.
Now, we'd like to apply a little hindsight to explain some questions you might have
had as we went along (we certainly critiqued our own approach as we reviewed it for
this chapter):
Why didn't we just optimize the slow queries to begin with?
Because the problem wasn't slow queries, it was “too many connections” errors.
Sure, it's logical to see that long-running queries cause things to stack up and the
connection count to climb. But so can dozens of other things. In the absence of
finding a good reason for why things are going wrong, it's all too tempting to fall
back to looking for slow queries or other general things that look like they could
be improved. 20 But this goes badly more often than it goes well. If you took your
car to the mechanic and complained about an unfamiliar noise, and then got slap-
ped with a bill for balancing the tires and changing the transmission fluid because
the mechanic couldn't figure out the real problem and went looking for other things
to do, wouldn't you be annoyed?
But isn't it a red flag that the queries were running slowly with a bad EXPLAIN ?
They were indeed—during the incidents. Was that a cause or an effect? It wasn't
obvious until we dug into things more deeply. And remember, the queries seemed
to be running well enough in normal circumstances. Just because a query does a
filesort with a temporary table doesn't mean it is a problem. Getting rid of filesorts
and temporary tables is a catch-all, “best practice” type of tactic.
Generic “best practices” reviews have their place, but they are seldom the solution
to highly specific problems. The problem could easily have been misconfiguration,
for example. We've seen many cases where someone tried to fix a misconfigured
server with tactics such as optimizing queries, which was ultimately a waste of time
and just prolonged the damage caused by the real problem.
If cached items were being regenerated many times, wouldn't there be multiple identical
queries?
Yes, and this is something we did not investigate at the time. Multiple threads
regenerating the same cached item would indeed cause many completely identical
queries. (This is different from having multiple queries of the same general type,
20. Also known as the “when all you have is a hammer, everything looks like a nail” approach.
 
Search WWH ::




Custom Search