Profiling Server Performance - High Performance MySQL

Databases Reference

In-Depth Information

After working through the diagnostic data to reach this point, the next task was obvi-

ous: measure what was causing the I/O. Unfortunately, this was infeasible on the ver-

sion of GNU/Linux the customer was using. We could have made an educated guess

with some work, but we wanted to explore other options first. As a proxy, we could

have measured how much I/O was coming from MySQL, but again, in this version of

MySQL that wasn't really feasible due to lack of instrumentation.

Instead, we opted to try to observe MySQL's I/O, based on what we know about how

it uses the disk. In general, MySQL writes only data, logs, sort files, and temporary

tables to disk. We eliminated data and logs from consideration, based on the status

counters and other information we discussed earlier. Now, suppose MySQL were to

suddenly write a bunch of data to disk temporary tables or sort files. How could we

observe this? Two easy ways are to watch the amount of free space on the disk, or to

look at the server's open filehandles with the lsof command. We did both, and the

results were convincing enough to satisfy us. Here's what df -h showed every second

during the same incident we've been studying:

Filesystem Size Used Avail Use% Mounted on

/dev/sda3 58G 20G 36G 36% /

/dev/sda3 58G 19G 36G 35% /

/dev/sda3 58G 18G 37G 33% /

And here's the data from lsof , which for some reason we gathered only once per five

seconds. We're simply summing the sizes of all of the files mysqld has open in /tmp ,

and printing out the total for each timestamped sample in the file:

$ awk '

/mysqld.*tmp/ {

total += $7;

}

/^Sun Mar 28/ && total {

printf "%s %7.2f MB\n", $4, total/1024/1024;

total = 0;

}' lsof.txt

18:34:38 1655.21 MB

18:34:43 1.88 MB

18:34:48 1.88 MB

18:34:53 1.88 MB

18:34:58 1.88 MB

Based on this data, it looks like MySQL is writing about 1.5 GB of data to temporary

tables in the beginning phases of the incident, and this matches what we found in the

SHOW PROCESSLIST states (“Copying to tmp table”). The evidence points to a storm of

19. Someone call the 1-800 hotline!

Search WWH ::

Custom Search

Home