Hardware Reference
In-Depth Information
FIGURE 3.32 The breakdown of causes for a thread being not ready . The contribution to
the “other” category varies. In TPC-C, store buffer full is the largest contributor; in SPEC-JBB,
atomic instructions are the largest contributor; and in SPECWeb99, both factors contribute.
Figure 3.33 shows the per-thread and per-core CPI. Because T1 is a fine-grained multith-
readed processor with four threads per core, with sufficient parallelism the ideal effective CPI
per thread would be four, since that would mean that each thread was consuming one cycle
out of every four. The ideal CPI per core would be one. In 2005, the IPC for these benchmarks
running on aggressive ILP cores would have been similar to that seen on a T1 core. The T1
core, however, was very modest in size compared to the aggressive ILP cores of 2005, which
is why the T1 had eight cores compared to the two to four offered on other processors of the
same vintage. As a result, in 2005 when it was introduced, the Sun T1 processor had the best
performance on integer applications with extensive TLP and demanding memory perform-
ance, such as SPECJBB and transaction processing workloads.
FIGURE 3.33 The per-thread CPI, the per-core CPI, the effective eight-core CPI, and the
effective IPC (inverse of CPI) for the eight-core T1 processor .
 
 
Search WWH ::




Custom Search