Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

blocking and to cause all the functional units to stall. As the issue rate and number of memory

references becomes large, this synchronization restriction becomes unacceptable. In more re-

cent processors, the functional units operate more independently, and the compiler is used to

avoid hazards at issue time, while hardware checks allow for unsynchronized execution once

instructions are issued.

Binary code compatibility has also been a major logistical problem for VLIWs. In a strict

VLIW approach, the code sequence makes use of both the instruction set definition and the

detailed pipeline structure, including both functional units and their latencies. Thus, difer-

ent numbers of functional units and unit latencies require different versions of the code. This

requirement makes migrating between successive implementations, or between implementa-

tions with different issue widths, more difficult than it is for a superscalar design. Of course,

obtaining improved performance from a new superscalar design may require recompilation.

Nonetheless, the ability to run old binary files is a practical advantage for the superscalar ap-

proach.

The EPIC approach, of which the IA-64 architecture is the primary example, provides solu-

tions to many of the problems encountered in early VLIW designs, including extensions for

more aggressive software speculation and methods to overcome the limitation of hardware

dependence while preserving binary compatibility.

The major challenge for all multiple-issue processors is to try to exploit large amounts of

ILP. When the parallelism comes from unrolling simple loops in FP programs, the original

loop probably could have been run efficiently on a vector processor (described in the next

chapter). It is not clear that a multiple-issue processor is preferred over a vector processor

for such applications; the costs are similar, and the vector processor is typically the same

speed or faster. The potential advantages of a multiple-issue processor versus a vector pro-

cessor are their ability to extract some parallelism from less structured code and their ability

to easily cache all forms of data. For these reasons multiple-issue approaches have become the

primary method for taking advantage of instruction-level parallelism, and vectors have be-

come primarily an extension to these processors.

3.8 Exploiting ILP Using Dynamic Scheduling,

Multiple Issue, and Speculation

So far, we have seen how the individual mechanisms of dynamic scheduling, multiple issue,

and speculation work. In this section, we put all three together, which yields a microarchi-

tecture quite similar to those in modern microprocessors. For simplicity, we consider only an

issue rate of two instructions per clock, but the concepts are no different from modern pro-

cessors that issue three or more instructions per clock.

Let's assume we want to extend Tomasulo's algorithm to support multiple-issue superscalar

pipeline with separate integer, load/store, and floating-point units (both FP multiply and FP

add), each of which can initiate an operation on every clock. We do not want to issue instruc-

tions to the reservation stations out of order, since this could lead to a violation of the program

semantics. To gain the full advantage of dynamic scheduling we will allow the pipeline to is-

sue any combination of two instructions in a clock, using the scheduling hardware to actually

assign operations to the integer and floating-point unit. Because the interaction of the integer

and floating-point instructions is crucial, we also extend Tomasulo's scheme to deal with both

the integer and floating-point functional units and registers, as well as incorporating specu-

lative execution. As Figure 3.17 shows, the basic organization is similar to that of a processor

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home