Hardware Reference
In-Depth Information
FIGURE 3.15 The five primary approaches in use for multiple-issue processors and the
primary characteristics that distinguish them . This chapter has focused on the hardware-
intensive techniques, which are all some form of superscalar. Appendix H focuses on
compiler-based approaches. The EPIC approach, as embodied in the IA-64 architecture, ex-
tends many of the concepts of the early VLIW approaches, providing a blend of static and dy-
namic approaches.
The Basic VLIW Approach
VLIWs use multiple, independent functional units. Rather than atempting to issue multiple,
independent instructions to the units, a VLIW packages the multiple operations into one very
long instruction, or requires that the instructions in the issue packet satisfy the same con-
straints. Since there is no fundamental difference in the two approaches, we will just assume
that multiple operations are placed in one instruction, as in the original VLIW approach.
Since the advantage of a VLIW increases as the maximum issue rate grows, we focus on a
wider issue processor. Indeed, for simple two-issue processors, the overhead of a superscalar
is probably minimal. Many designers would probably argue that a four-issue processor has
manageable overhead, but as we will see later in this chapter, the growth in overhead is a ma-
jor factor limiting wider issue processors.
Let's consider a VLIW processor with instructions that contain five operations, including
one integer operation (which could also be a branch), two floating-point operations, and
two memory references. The instruction would have a set of fields for each functional
unit—perhaps 16 to 24 bits per unit, yielding an instruction length of between 80 and 120 bits.
By comparison, the Intel Itanium 1 and 2 contain six operations per instruction packet (i.e.,
they allow concurrent issue of two three-instruction bundles, as Appendix H describes).
To keep the functional units busy, there must be enough parallelism in a code sequence to
ill the available operation slots. This parallelism is uncovered by unrolling loops and schedul-
ing the code within the single larger loop body. If the unrolling generates straight-line code,
 
Search WWH ::




Custom Search