Databases Reference
In-Depth Information
(response time) is acceptable. Most systems are not linearly scalable, and exhibit
diminishing returns and degraded performance as you vary the parameters.
Scalability measurements are good for capacity planning, because they can show
weaknesses in your application that other benchmark strategies won't show. For
example, if you design your system to perform well on a response-time benchmark
with a single connection (a poor benchmark strategy), your application might per-
form badly when there's any degree of concurrency. A benchmark that looks for
consistent response times under an increasing number of connections would show
this design flaw.
Some activities, such as batch jobs to create summary tables from granular data,
just need fast response times, period. It's fine to benchmark them for pure response
time, but remember to think about how they'll interact with other activities. Batch
jobs can cause interactive queries to suffer, and vice versa.
In the final analysis, it's best to benchmark whatever is important to your users. Try to
gather some requirements (formally or informally) about what acceptable response
times are, what kind of concurrency you expect, and so on. Then try to design your
benchmarks to satisfy all of the requirements, without getting tunnel vision and focus-
ing on some things to the exclusion of others.
Benchmarking Tactics
With the general behind us, let's move on to the specifics of how to design and execute
benchmarks. Before we discuss how to do benchmarks well, though, let's look at some
common mistakes that can lead to unusable or inaccurate results:
• Using a subset of the real data size, such as using only one gigabyte of data when
the application will need to handle hundreds of gigabytes, or using the current
dataset when you plan for the application to grow much larger.
• Using incorrectly distributed data, such as uniformly distributed data when the
real system's data will have “hot spots.” (Randomly generated data is almost always
unrealistically distributed.)
• Using unrealistically distributed parameters, such as pretending that all user pro-
files are equally likely to be viewed. 2
• Using a single-user scenario for a multiuser application.
• Benchmarking a distributed application on a single server.
• Failing to match real user behavior, such as “think time” on a web page. Real users
request a page and then read it; they don't click on links one after another without
pausing.
2. Justin Bieber, we love you! Just kidding.
 
Search WWH ::




Custom Search