Databases Reference
In-Depth Information
• Each file is named after the date and hour when the benchmark is run. When
benchmarks last for days and the files grow large, you might find it handy to move
previous files off the server and free up some disk space if needed, and get a head
start on analyzing the full results. When you're looking for data about a specific
point in time, it's also nice to be able to find it in a file named after the hour, rather
than searching through a single file that has grown to gigabytes in size.
• Each sample begins with a distinctive timestamp line, so you can search through
the files for samples related to specific times, and you can write little awk and sed
scripts easily.
• The script doesn't preprocess or filter anything it gathers. It's a good idea to gather
everything in its raw form, and process and filter it later. If you preprocess it, you'll
surely find yourself wishing for the raw data later when you find an anomaly and
need more data to understand it.
• You can make the script exit when the benchmark is done by removing the /home/
benchmarks/running file in the script that executes your benchmark.
This is just a short code snippet, and probably won't meet your needs as-is, but it's an
illustration of a good general approach to capturing performance and status data. As
shown, the script captures only a few kinds of data on MySQL, but you can easily add
more things to it. You can capture /proc/diskstats to record disk I/O for later analysis
with the pt-diskstats tool, 5 for example.
Getting Accurate Results
The best way to get accurate results is to design your benchmark to answer the question
you want to answer. Have you chosen the right benchmark? Are you capturing the data
you need to answer the question? Are you benchmarking by the wrong criteria? For
example, are you running a CPU-bound benchmark to predict the performance of an
application you know will be I/O-bound?
Next, make sure your benchmark results will be repeatable. Try to ensure that the
system is in the same state at the beginning of each run. If the benchmark is important,
you should reboot between runs. If you need to benchmark on a warmed-up server,
which is the norm, you should also make sure that your warmup is long enough (see
the previous section on how long to run a benchmark), and that it's repeatable. If the
warmup consists of random queries, for example, your benchmark results will not be
repeatable.
If the benchmark changes data or schema, reset it with a fresh snapshot between runs.
Inserting into a table with a thousand rows will not give the same results as inserting
into a table with a million rows! The data fragmentation and layout on disk can also
5. See Chapter 9 for more on the pt-diskstats tool.
 
Search WWH ::




Custom Search