Thread-Level Parallelism - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

3. A Web index search (AltaVista) benchmark based on a search of a memory-mapped ver-

sion of the AltaVista database (200 GB). The inner loop is heavily optimized. Because the

search structure is static, litle synchronization is needed among the threads. AltaVista was

the most popular Web search engine before the arrival of Google.

Figure 5.10 shows the percentages of time spent in user mode, in the kernel, and in the idle

loop. The frequency of I/O increases both the kernel time and the idle time (see the OLTP

entry, which has the largest I/O-to-computation ratio). AltaVista, which maps the entire search

database into memory and has been extensively tuned, shows the least kernel or idle time.

FIGURE 5.10 The distribution of execution time in the commercial workloads . The

OLTP benchmark has the largest fraction of both OS time and processor idle time (which is

I/O wait time). The DSS benchmark shows much less OS time, since it does less I/O, but still

more than 9% idle time. The extensive tuning of the AltaVista search engine is clear in these

measurements. The data for this workload were collected by Barroso, Gharachorloo, and

Bugnion [1998] on a four-processor AlphaServer 4100.

Performance Measurements Of The Commercial Workload

We start by looking at the overall processor execution for these benchmarks on the four-

processor system; as discussed on page 367, these benchmarks include substantial I/O time,

which is ignored in the processor time measurements. We group the six DSS queries as a

single benchmark, reporting the average behavior. The effective CPI varies widely for these

benchmarks, from a CPI of 1.3 for the AltaVista Web search, to an average CPI of 1.6 for

the DSS workload, to 7.0 for the OLTP workload. Figure 5.11 shows how the execution time

breaks down into instruction execution, cache and memory system access time, and other

stalls (which are primarily pipeline resource stalls but also include translation lookaside buf-

fer (TLB) and branch mispredict stalls). Although the performance of the DSS and AltaVista

workloads is reasonable, the performance of the OLTP workload is very poor, due to a poor

performance of the memory hierarchy.

Search WWH ::

Custom Search

Home