Hardware Reference
In-Depth Information
S4
ALU
ALU
S1
S2
S3
S5
Instruction
fetch
unit
Instruction
decode
unit
Operand
fetch
unit
Write
back
unit
LOAD
STORE
Floating
point
Figure 2-6. A superscalar processor with five functional units.
idea. In reality, most of the functional units in stage 4 take appreciably longer than
one clock cycle to execute, certainly the ones that access memory or do float-
ing-point arithmetic. As can be seen from the figure, it is possible to have multiple
ALUs in stage S4.
2.1.6 Processor-Level Parallelism
The demand for ever faster computers seems to be insatiable. Astronomers
want to simulate what happened in the first microsecond after the big bang,
economists want to model the world economy, and teenagers want to play 3D
interactive multimedia games over the Internet with their virtual friends. While
CPUs keep getting faster, eventually they are going to run into the problems with
the speed of light, which is likely to stay at 20 cm/nanosecond in copper wire or
optical fiber, no matter how clever Intel's engineers are. Faster chips also produce
more heat, whose dissipation is a huge problem. In fact, the difficulty of getting
rid of the heat produced is the main reason CPU clock speeds have stagnated in the
past decade.
Instruction-level parallelism helps a little, but pipelining and superscalar opera-
tion rarely win more than a factor of five or ten. To get gains of 50, 100, or more,
the only way is to design computers with multiple CPUs, so we will now take a
look at how some of these are organized.
 
 
Search WWH ::




Custom Search