Databases Reference
In-Depth Information
The primary profiling tool we rely on for server internals on GNU/Linux (as opposed
to queries server-wide) is
oprofile
. We'll show examples of this a bit later. You can also
profile the server's system calls with
strace
, but we have found this to be riskier on
production systems. More on that later, too. For capturing queries to profile, we like
to use
tcpdump
. It's hard to turn the slow query log on and off reliably at a moment's
notice on most versions of MySQL, but you can get a pretty good simulation of it from
TCP traffic. Besides, the traffic is useful for lots of other kinds of analysis.
For wait analysis, we often use GDB stack traces.
14
Threads that are stuck in a particular
spot inside of MySQL for a long time tend to have the same stack trace. The procedure
is to start
gdb
, attach it to the
mysqld
process, and dump stack traces for all threads.
You can then use some short scripts to aggregate common stack traces together and do
the
sort|uniq|sort
magic to show which ones are most common. We'll show how to use
the
pt-pmp
tool for this a bit later.
You can also do wait analysis with data such as snapshots of
SHOW PROCESSLIST
and
SHOW INNODB STATUS
by observing thread and transaction states. None of these
approaches is perfectly foolproof, but in practice they work often enough to be very
helpful.
Gathering all of this data sounds like a lot of work! You probably anticipated this
already, but we've built a tool to do this for you too. It's called
pt-collect
, and it's also
part of Percona Toolkit. It's intended to be executed from
pt-stalk
. It needs to be run
as
root
in order to gather most of the important data. By default, it will collect data for
30 seconds and then exit. This is usually enough to diagnose most problems, but not
so much that it causes problems when there's a false positive.
The tool is easy to download and doesn't need any configuration—all of the configu-
ration goes into
pt-stalk
. You will want to ensure that
gdb
and
oprofile
are installed on
your server, and enable those in the
pt-stalk
configuration. You also need to ensure that
mysqld
has debug symbols.
15
When the trigger condition occurs, the tool will gather a
pretty complete set of data. It will create timestamped files in a specified directory. At
the time of writing, it's rather oriented toward GNU/Linux and will need tweaking on
other operating systems, but it's still a good place to start.
Interpreting the data
If you've set up your trigger condition correctly and let
pt-stalk
run long enough to
catch the problem in action a few times, you'll end up with a lot of data to sift through.
What's the most useful place to start? We suggest looking at just a few things, with two
14. A caveat: using GDB is intrusive. It will freeze the server momentarily, especially if you have a lot of
threads (connections), and can sometimes even crash it. The benefit still sometimes outweighs the risk.
If the server becomes unusable anyway during a stall, it's not such a bad thing to double-freeze it.
15. Sometimes symbols are omitted as an “optimization,” which really is not an optimization; it just makes
diagnosing problems harder. You can use the
nm
tool to check if you have them, and install the
debuginfo
packages for MySQL to supply symbols.