Benchmarking MySQL - High Performance MySQL

Databases Reference

In-Depth Information

(response time) is acceptable. Most systems are not linearly scalable, and exhibit

diminishing returns and degraded performance as you vary the parameters.

Scalability measurements are good for capacity planning, because they can show

weaknesses in your application that other benchmark strategies won't show. For

example, if you design your system to perform well on a response-time benchmark

with a single connection (a poor benchmark strategy), your application might per-

form badly when there's any degree of concurrency. A benchmark that looks for

consistent response times under an increasing number of connections would show

this design flaw.

Some activities, such as batch jobs to create summary tables from granular data,

just need fast response times, period. It's fine to benchmark them for pure response

time, but remember to think about how they'll interact with other activities. Batch

jobs can cause interactive queries to suffer, and vice versa.

In the final analysis, it's best to benchmark whatever is important to your users. Try to

gather some requirements (formally or informally) about what acceptable response

times are, what kind of concurrency you expect, and so on. Then try to design your

benchmarks to satisfy all of the requirements, without getting tunnel vision and focus-

ing on some things to the exclusion of others.

Benchmarking Tactics

With the general behind us, let's move on to the specifics of how to design and execute

benchmarks. Before we discuss how to do benchmarks well, though, let's look at some

common mistakes that can lead to unusable or inaccurate results:

• Using a subset of the real data size, such as using only one gigabyte of data when

the application will need to handle hundreds of gigabytes, or using the current

dataset when you plan for the application to grow much larger.

• Using incorrectly distributed data, such as uniformly distributed data when the

real system's data will have “hot spots.” (Randomly generated data is almost always

unrealistically distributed.)

• Using unrealistically distributed parameters, such as pretending that all user pro-

files are equally likely to be viewed. 2

• Using a single-user scenario for a multiuser application.

• Benchmarking a distributed application on a single server.

• Failing to match real user behavior, such as “think time” on a web page. Real users

request a page and then read it; they don't click on links one after another without

pausing.

2. Justin Bieber, we love you! Just kidding.

Search WWH ::

Custom Search

Home