Hardware Reference
In-Depth Information
blocking and to cause all the functional units to stall. As the issue rate and number of memory
references becomes large, this synchronization restriction becomes unacceptable. In more re-
cent processors, the functional units operate more independently, and the compiler is used to
avoid hazards at issue time, while hardware checks allow for unsynchronized execution once
instructions are issued.
Binary code compatibility has also been a major logistical problem for VLIWs. In a strict
VLIW approach, the code sequence makes use of both the instruction set definition and the
detailed pipeline structure, including both functional units and their latencies. Thus, difer-
ent numbers of functional units and unit latencies require different versions of the code. This
requirement makes migrating between successive implementations, or between implementa-
tions with different issue widths, more difficult than it is for a superscalar design. Of course,
obtaining improved performance from a new superscalar design may require recompilation.
Nonetheless, the ability to run old binary files is a practical advantage for the superscalar ap-
proach.
The EPIC approach, of which the IA-64 architecture is the primary example, provides solu-
tions to many of the problems encountered in early VLIW designs, including extensions for
more aggressive software speculation and methods to overcome the limitation of hardware
dependence while preserving binary compatibility.
The major challenge for all multiple-issue processors is to try to exploit large amounts of
ILP. When the parallelism comes from unrolling simple loops in FP programs, the original
loop probably could have been run efficiently on a vector processor (described in the next
chapter). It is not clear that a multiple-issue processor is preferred over a vector processor
for such applications; the costs are similar, and the vector processor is typically the same
speed or faster. The potential advantages of a multiple-issue processor versus a vector pro-
cessor are their ability to extract some parallelism from less structured code and their ability
to easily cache all forms of data. For these reasons multiple-issue approaches have become the
primary method for taking advantage of instruction-level parallelism, and vectors have be-
come primarily an extension to these processors.
3.8 Exploiting ILP Using Dynamic Scheduling,
Multiple Issue, and Speculation
So far, we have seen how the individual mechanisms of dynamic scheduling, multiple issue,
and speculation work. In this section, we put all three together, which yields a microarchi-
tecture quite similar to those in modern microprocessors. For simplicity, we consider only an
issue rate of two instructions per clock, but the concepts are no different from modern pro-
cessors that issue three or more instructions per clock.
Let's assume we want to extend Tomasulo's algorithm to support multiple-issue superscalar
pipeline with separate integer, load/store, and floating-point units (both FP multiply and FP
add), each of which can initiate an operation on every clock. We do not want to issue instruc-
tions to the reservation stations out of order, since this could lead to a violation of the program
semantics. To gain the full advantage of dynamic scheduling we will allow the pipeline to is-
sue any combination of two instructions in a clock, using the scheduling hardware to actually
assign operations to the integer and floating-point unit. Because the interaction of the integer
and floating-point instructions is crucial, we also extend Tomasulo's scheme to deal with both
the integer and floating-point functional units and registers, as well as incorporating specu-
lative execution. As Figure 3.17 shows, the basic organization is similar to that of a processor
 
Search WWH ::




Custom Search