Hardware Reference
In-Depth Information
Integer
Integer
Floating
point
(a)
Fetch
Decode
Issue
Retire
Load
Store
No-op
(b)
---L-
---L-
l--L-
llF-S
l-FLS
l-FL-
lIF-S ---L-
--LFS
End-of-bundle marker
VLIW instruction
(c)
LL
lL
llFS
lFLS
lFL
lIFS
L
LFS
Bundle
Figure 8-2. (a) A CPU pipeline. (b) A sequence of VLIW instructions. (c) An
instruction stream with bundles marked.
However, this design proved too rigid because not every instruction was able to
utilize every functional unit, leading to many useless NO-OP s used as filler, as illus-
trated in Fig. 8-2(b). Consequently, modern VLIW machines have a way of mark-
ing a bundle of instructions as belonging together, for example with an ''end-of-
bundle'' bit, as shown in Fig. 8-2(c). The processor can then fetch the entire bun-
dle and issue it all at once. It is up to the compiler to prepare bundles of compati-
ble instructions.
In effect, VLIW shifts the burden of determining which instructions can be
issued together from run time to compile time. Not only does this choice make the
hardware simpler and faster, but since an optimizing compiler can run for a long
time if need be, better bundles can be assembled than what the hardware could do
at run time. Of course, such a radical change in CPU architecture will be difficult
to introduce, as demonstrated by the slow acceptance of the Itanium except for
niche applications.
It is worth noting in passing that instruction-level parallelism is not the only
form of low-level parallelism. Another is memory-level parallelism, in which mul-
tiple memory operations are in flight at the same time (Chou et al., 2004).
Search WWH ::




Custom Search