PARALLEL COMPUTER ARCHITECTURES - Structured Computer Organization

Hardware Reference

In-Depth Information

Integer

Floating

point

(a)

Fetch

Decode

Issue

Retire

Load

Store

No-op

(b)

---L-

l--L-

llF-S

l-FLS

l-FL-

lIF-S ---L-

--LFS

End-of-bundle marker

VLIW instruction

(c)

LL

lL

llFS

lFLS

lFL

lIFS

L

LFS

Bundle

Figure 8-2. (a) A CPU pipeline. (b) A sequence of VLIW instructions. (c) An

instruction stream with bundles marked.

However, this design proved too rigid because not every instruction was able to

utilize every functional unit, leading to many useless NO-OP s used as filler, as illus-

trated in Fig. 8-2(b). Consequently, modern VLIW machines have a way of mark-

ing a bundle of instructions as belonging together, for example with an ''end-of-

bundle'' bit, as shown in Fig. 8-2(c). The processor can then fetch the entire bun-

dle and issue it all at once. It is up to the compiler to prepare bundles of compati-

ble instructions.

In effect, VLIW shifts the burden of determining which instructions can be

issued together from run time to compile time. Not only does this choice make the

hardware simpler and faster, but since an optimizing compiler can run for a long

time if need be, better bundles can be assembled than what the hardware could do

at run time. Of course, such a radical change in CPU architecture will be difficult

to introduce, as demonstrated by the slow acceptance of the Itanium except for

niche applications.

It is worth noting in passing that instruction-level parallelism is not the only

form of low-level parallelism. Another is memory-level parallelism, in which mul-

tiple memory operations are in flight at the same time (Chou et al., 2004).

Structured Computer Organization

Search WWH ::

Custom Search

Home