Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

FIGURE 3.15 The five primary approaches in use for multiple-issue processors and the

primary characteristics that distinguish them . This chapter has focused on the hardware-

intensive techniques, which are all some form of superscalar. Appendix H focuses on

compiler-based approaches. The EPIC approach, as embodied in the IA-64 architecture, ex-

tends many of the concepts of the early VLIW approaches, providing a blend of static and dy-

namic approaches.

The Basic VLIW Approach

VLIWs use multiple, independent functional units. Rather than atempting to issue multiple,

independent instructions to the units, a VLIW packages the multiple operations into one very

long instruction, or requires that the instructions in the issue packet satisfy the same con-

straints. Since there is no fundamental difference in the two approaches, we will just assume

that multiple operations are placed in one instruction, as in the original VLIW approach.

Since the advantage of a VLIW increases as the maximum issue rate grows, we focus on a

wider issue processor. Indeed, for simple two-issue processors, the overhead of a superscalar

is probably minimal. Many designers would probably argue that a four-issue processor has

manageable overhead, but as we will see later in this chapter, the growth in overhead is a ma-

jor factor limiting wider issue processors.

Let's consider a VLIW processor with instructions that contain five operations, including

one integer operation (which could also be a branch), two floating-point operations, and

two memory references. The instruction would have a set of fields for each functional

unit—perhaps 16 to 24 bits per unit, yielding an instruction length of between 80 and 120 bits.

By comparison, the Intel Itanium 1 and 2 contain six operations per instruction packet (i.e.,

they allow concurrent issue of two three-instruction bundles, as Appendix H describes).

To keep the functional units busy, there must be enough parallelism in a code sequence to

ill the available operation slots. This parallelism is uncovered by unrolling loops and schedul-

ing the code within the single larger loop body. If the unrolling generates straight-line code,

Search WWH ::

Custom Search

Home