Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

By similar reasoning, we cannot allow such instructions to cause the cache to stall on a miss

because again unnecessary stalls could overwhelm the benefits of speculation. Hence, these

processors must be matched with nonblocking caches.

In reality, the penalty of an L2 miss is so large that compilers normally only speculate on L1

misses. Figure 2.5 on page 84 shows that for some well-behaved scientific programs the com-

piler can sustain multiple outstanding L2 misses to cut the L2 miss penalty effectively. Once

again, for this to work the memory system behind the cache must match the goals of the com-

piler in number of simultaneous memory accesses.

3.12 Multithreading: Exploiting Thread-Level

Parallelism to Improve Uniprocessor Throughput

The topic we cover in this section, multithreading, is truly a cross-cuting topic, since it has

relevance to pipelining and superscalars, to graphics processing units ( Chapter 4 ), and to mul-

tiprocessors ( Chapter 5 ). We introduce the topic here and explore the use of multithreading

to increase uniprocessor throughput by using multiple threads to hide pipeline and memory

latencies. In the next chapter, we will see how multithreading provides the same advantages

in GPUs, and finally, Chapter 5 will explore the combination of multithreading and multi-

processing. These topics are closely interwoven, since multithreading is a primary technique

for exposing more parallelism to the hardware. In a strict sense, multithreading uses thread-

level parallelism, and thus is properly the subject of Chapter 5 , but its role in both improving

pipeline utilization and in GPUs motivates us to introduce the concept here.

Although increasing performance by using ILP has the great advantage that it is reasonably

transparent to the programmer, as we have seen ILP can be quite limited or difficult to exploit

in some applications. In particular, with reasonable instruction issue rates, cache misses that

go to memory or off-chip caches are unlikely to be hidden by available ILP. Of course, when

the processor is stalled waiting on a cache miss, the utilization of the functional units drops

dramatically.

Since atempts to cover long memory stalls with more ILP have limited efectiveness, it is

natural to ask whether other forms of parallelism in an application could be used to hide

memory delays. For example, an online transaction-processing system has natural parallelism

among the multiple queries and updates that are presented by requests. Of course, many sci-

entific applications contain natural parallelism since they often model the three-dimension-

al, parallel structure of nature, and that structure can be exploited by using separate threads.

Even desktop applications that use modern Windows-based operating systems often have

multiple active applications running, providing a source of parallelism.

Multithreading allows multiple threads to share the functional units of a single processor in

an overlapping fashion. In contrast, a more general method to exploit thread-level parallelism

(TLP) is with a multiprocessor that has multiple independent threads operating at once and

in parallel. Multithreading, however, does not duplicate the entire processor as a multipro-

cessor does. Instead, multithreading shares most of the processor core among a set of threads,

duplicating only private state, such as the registers and program counter. As we will see in

Chapter 5 , many recent processors incorporate both multiple processor cores on a single chip

and provide multithreading within each core.

Duplicating the per-thread state of a processor core means creating a separate register ile,

a separate PC, and a separate page table for each thread. The memory itself can be shared

through the virtual memory mechanisms, which already support multiprogramming. In addi-

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home