Databases Reference
In-Depth Information
has its limitations, such as confusing units of measurement, sampling at intervals that
don't correspond to when the operating system updates the statistics, and the inability
to see all of the metrics at once. If these tools don't meet your needs, you might be
interested in
dstat
(
http://dag.wieers.com/home-made/dstat/
) or
collectl
(
http://collectl
We also like to use
mpstat
to watch CPU statistics; it provides a much better idea of
how the CPUs are behaving individually, instead of grouping them all together. Some-
times this is very important when you're diagnosing a problem. You might find
blktrace
to be helpful when you're examining disk I/O usage, too.
We wrote our own replacement for
iostat
, called
pt-diskstats
. It's part of Percona Tool-
kit. It addresses some of our complaints about
iostat
, such as the way that it presents
reads and writes in aggregate, and the lack of visibility into concurrency. It is also
interactive and keystroke-driven, so you can zoom in and out, change the aggregation,
filter out devices, and show and hide columns. It is a great way to slice and dice a sample
of disk statistics, which you can gather with a simple shell script even if you don't have
the tool installed. You can capture samples of disk activity and email or save them for
later analysis. In fact, the
pt-stalk
,
pt-collect
, and
pt-sift
trio of tools that we introduced
in
Chapter 3
are designed to work well with
pt-diskstats
.
A CPU-Bound Machine
The
vmstat
output for a CPU-bound server usually shows a high value in the
us
column,
which reports time spent running non-kernel code. There can also be a high value in
the
sy
column, which is the system CPU usage; a value over 20% here is worrisome.
In most cases, there will also be several processes queued up for CPU time (reported
in the
r
column). Here's a sample:
$
vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
10 2 740880 19256 46068 13719952 0 0 2788 11047 1423 14508 89 4 4 3
11 0 740880 19692 46144 13702944 0 0 2907 14073 1504 23045 90 5 2 3
7 1 740880 20460 46264 13683852 0 0 3554 15567 1513 24182 88 5 3 3
10 2 740880 22292 46324 13670396 0 0 2640 16351 1520 17436 88 4 4 3
Notice that there are also a reasonable number of context switches (the
cs
column),
although we won't worry much about this unless there are 100,000 or more per second.
A
context switch
is when the operating system stops one process from running and
replaces it with another.
For example, a query that performs a noncovering index scan on a MyISAM table will
read an entry from the index, then read the row from a page on disk. If the page isn't
in the operating system cache, there will be a physical read to the disk, which will cause
a context switch to suspend the process until the I/O completes. Such a query can cause
lots of context switches.