Hardware Reference
In-Depth Information
that must be performed serially in a single clock cycle that determines how long
the clock cycle must be.
One aspect that can be controlled is the amount of decoding that must be per-
formed. Recall, for example, that in Fig. 4-6 we saw that while any of nine regis-
ters could be read into the ALU from the B bus, we required only 4 bits in the
microinstruction word to specify which register was to be selected. Unfortunately,
these savings come at a price. The decode circuit adds delay in the critical path. It
means that whichever register is to enable its data onto the B bus will receive that
command slightly later and will get its data on the bus slightly later. This effect
cascades, with the ALU receiving its inputs a little later and producing its results a
little later. Finally, the result is available on the C bus to be written to the registers
a little later. Since this delay often is the factor that determines how long the clock
cycle must be, this may mean that the clock cannot run quite as fast, and the entire
computer must run a little slower. Thus there is a trade-off between speed and
cost. Reducing the control store by 5 bits per word comes at the cost of slowing
down the clock. The design engineer must take the design objectives into account
when deciding which is the right choice. For a high-performance implementation,
using a decoder is probably not a good idea; for a low-cost one, it might be.
4.4.2 Reducing the Execution Path Length
The Mic-1 was designed to be both moderately simple and moderately fast, al-
though there is admittedly an enormous tension between these two goals. Briefly
stated, simple machines are not fast and fast machines are not simple. The Mic-1
CPU also uses a minimum amount of hardware: 10 registers, the simple ALU of
Fig. 3-19 replicated 32 times, a shifter, a decoder, a control store, and a bit of glue
here and there. The whole system could be built with fewer than 5000 transistors
plus whatever the control store (ROM) and main memory (RAM) take.
Having seen how IJVM can be implemented in a straightforward way in
microcode with little hardware, let us now look at alternative, faster imple-
mentations. We will next look at ways to reduce the number of microinstructions
per ISA instruction (i.e., reducing the execution path length). After that, we will
consider other approaches.
Merging the Interpreter Loop with the Microcode
In the Mic-1, the main loop consists of one microinstruction that must be ex-
ecuted at the beginning of every IJVM instruction. In some cases it is possible to
overlap it with the previous instruction. In fact, this has already been partially
accomplished. Notice that when Main1 is executed, the opcode to be interpreted is
already in MBR . It is there because it was fetched either by the previous main loop
(if the previous instruction had no operands) or during the execution of the previ-
ous instruction.
 
 
Search WWH ::




Custom Search