Hardware Reference
In-Depth Information
4.4 DESIGN OF THE MICROARCHITECTURE LEVEL
Like just about everything else in computer science, the design of the micro-
architecture level is full of trade-offs. Computers have many desirable charac-
teristics, including speed, cost, reliability, ease of use, energy requirements, and
physical size. However, one trade-off drives the most important choices the CPU
designer must make: speed versus cost. In this section we will look at this issue in
detail to see what can be traded off against what, how high performance can be
achieved, and at what price in hardware and complexity.
4.4.1 Speed versus Cost
While faster technology has resulted in the greatest speedup over any period of
time, that is beyond the scope of this text. Speed improvements due to organiza-
tion, while less amazing than that due to faster circuits, have nevertheless been
impressive. Speed can be measured in a variety of ways, but given a circuit tech-
nology and an ISA, there are three basic approaches for increasing the speed of ex-
ecution:
1. Reduce the number of clock cycles needed to execute an instruction.
2. Simplify the organization so that the clock cycle can be shorter.
3. Overlap the execution of instructions.
The first two are obvious, but there is a surprising variety of design opportunities
that can dramatically affect either the number of clock cycles, the clock period,
or—most often—both. In this section, we will give an example of how the en-
coding and decoding of an operation can affect the clock cycle.
The number of clock cycles needed to execute a set of operations is known as
the path length . Sometimes the path length can be shortened by adding spe-
cialized hardware. For example, by adding an incrementer (conceptually, an adder
with one side permanently wired to add 1) to PC , we no longer have to use the
ALU to advance PC , eliminating cycles. The price paid is more hardware. How-
ever, this capability does not help as much as might be expected. For most instruc-
tions, the cycles consumed incrementing the PC are also cycles where a read oper-
ation is being performed. The subsequent instruction could not be executed earlier
anyway because it depends on the data coming from the memory.
Reducing the number of instruction cycles necessary for fetching instructions
requires more than just an additional circuit to increment the PC. In order to speed
up the instruction fetching to any significant degree, the third techni-
que—overlapping the execution of instructions—must be exploited. Separating
out the circuitry for fetching the instructions—the 8-bit memory port, and the MBR
and PC registers—is most effective if the unit is made functionally independent of
the main data path. In this way, it can fetch the next opcode or operand on its own,
 
 
 
Search WWH ::




Custom Search