Hardware Reference
In-Depth Information
vendor-independent Transaction Processing Council (TPC) to try to create realistic and fair
benchmarks for TP. The TPC benchmarks are described at www.tpc.org .
The first TPC benchmark, TPC-A, was published in 1985 and has since been replaced and
enhanced by several different benchmarks. TPC-C, initially created in 1992, simulates a com-
plex query environment. TPC-H models ad hoc decision support—the queries are unrelated
and knowledge of past queries cannot be used to optimize future queries. TPC-E is a new On-
Line Transaction Processing (OLTP) workload that simulates a brokerage firm's customer ac-
counts. The most recent effort is TPC Energy, which adds energy metrics to all the existing
TPC benchmarks.
All the TPC benchmarks measure performance in transactions per second. In addition, they
include a response time requirement, so that throughput performance is measured only when
the response time limit is met. To model real-world systems, higher transaction rates are also
associated with larger systems, in terms of both users and the database to which the trans-
actions are applied. Finally, the system cost for a benchmark system must also be included,
allowing accurate comparisons of cost-performance. TPC modified its pricing policy so that
there is a single specification for all the TPC benchmarks and to allow verification of the prices
that TPC publishes.
Reporting Performance Results
The guiding principle of reporting performance measurements should be reproducibility —list
everything another experimenter would need to duplicate the results. A SPEC benchmark re-
port requires an extensive description of the computer and the compiler flags, as well as the
publication of both the baseline and optimized results. In addition to hardware, software, and
baseline tuning parameter descriptions, a SPEC report contains the actual performance times,
shown both in tabular form and as a graph. A TPC benchmark report is even more complete,
since it must include results of a benchmarking audit and cost information. These reports are
excellent sources for finding the real costs of computing systems, since manufacturers com-
pete on high performance and cost-performance.
Summarizing Performance Results
In practical computer design, you must evaluate myriad design choices for their relative
quantitative benefits across a suite of benchmarks believed to be relevant. Likewise, con-
sumers trying to choose a computer will rely on performance measurements from bench-
marks, which hopefully are similar to the user's applications. In both cases, it is useful to have
measurements for a suite of benchmarks so that the performance of important applications is
similar to that of one or more benchmarks in the suite and that variability in performance can
be understood. In the ideal case, the suite resembles a statistically valid sample of the applic-
ation space, but such a sample requires more benchmarks than are typically found in most
suites and requires a randomized sampling, which essentially no benchmark suite uses.
Once we have chosen to measure performance with a benchmark suite, we would like to
be able to summarize the performance results of the suite in a single number. A straightfor-
ward approach to computing a summary result would be to compare the arithmetic means of
the execution times of the programs in the suite. Alas, some SPEC programs take four times
longer than others do, so those programs would be much more important if the arithmetic
mean were the single number used to summarize performance. An alternative would be to
add a weighting factor to each benchmark and use the weighted arithmetic mean as the single
number to summarize performance. The problem would then be how to pick weights; since
Search WWH ::




Custom Search