Hardware Reference
In-Depth Information
3. A Web index search (AltaVista) benchmark based on a search of a memory-mapped ver-
sion of the AltaVista database (200 GB). The inner loop is heavily optimized. Because the
search structure is static, litle synchronization is needed among the threads. AltaVista was
the most popular Web search engine before the arrival of Google.
Figure 5.10 shows the percentages of time spent in user mode, in the kernel, and in the idle
loop. The frequency of I/O increases both the kernel time and the idle time (see the OLTP
entry, which has the largest I/O-to-computation ratio). AltaVista, which maps the entire search
database into memory and has been extensively tuned, shows the least kernel or idle time.
FIGURE 5.10 The distribution of execution time in the commercial workloads . The
OLTP benchmark has the largest fraction of both OS time and processor idle time (which is
I/O wait time). The DSS benchmark shows much less OS time, since it does less I/O, but still
more than 9% idle time. The extensive tuning of the AltaVista search engine is clear in these
measurements. The data for this workload were collected by Barroso, Gharachorloo, and
Bugnion [1998] on a four-processor AlphaServer 4100.
Performance Measurements Of The Commercial Workload
We start by looking at the overall processor execution for these benchmarks on the four-
processor system; as discussed on page 367, these benchmarks include substantial I/O time,
which is ignored in the processor time measurements. We group the six DSS queries as a
single benchmark, reporting the average behavior. The effective CPI varies widely for these
benchmarks, from a CPI of 1.3 for the AltaVista Web search, to an average CPI of 1.6 for
the DSS workload, to 7.0 for the OLTP workload. Figure 5.11 shows how the execution time
breaks down into instruction execution, cache and memory system access time, and other
stalls (which are primarily pipeline resource stalls but also include translation lookaside buf-
fer (TLB) and branch mispredict stalls). Although the performance of the DSS and AltaVista
workloads is reasonable, the performance of the OLTP workload is very poor, due to a poor
performance of the memory hierarchy.
 
Search WWH ::




Custom Search