Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

FIGURE 3.32 The breakdown of causes for a thread being not ready . The contribution to

the “other” category varies. In TPC-C, store buffer full is the largest contributor; in SPEC-JBB,

atomic instructions are the largest contributor; and in SPECWeb99, both factors contribute.

Figure 3.33 shows the per-thread and per-core CPI. Because T1 is a fine-grained multith-

readed processor with four threads per core, with sufficient parallelism the ideal effective CPI

per thread would be four, since that would mean that each thread was consuming one cycle

out of every four. The ideal CPI per core would be one. In 2005, the IPC for these benchmarks

running on aggressive ILP cores would have been similar to that seen on a T1 core. The T1

core, however, was very modest in size compared to the aggressive ILP cores of 2005, which

is why the T1 had eight cores compared to the two to four offered on other processors of the

same vintage. As a result, in 2005 when it was introduced, the Sun T1 processor had the best

performance on integer applications with extensive TLP and demanding memory perform-

ance, such as SPECJBB and transaction processing workloads.

FIGURE 3.33 The per-thread CPI, the per-core CPI, the effective eight-core CPI, and the

effective IPC (inverse of CPI) for the eight-core T1 processor .

Search WWH ::

Custom Search

Home