THE MICROARCHITECTURE LEVEL - Structured Computer Organization

Hardware Reference

In-Depth Information

that must be performed serially in a single clock cycle that determines how long

the clock cycle must be.

One aspect that can be controlled is the amount of decoding that must be per-

formed. Recall, for example, that in Fig. 4-6 we saw that while any of nine regis-

ters could be read into the ALU from the B bus, we required only 4 bits in the

microinstruction word to specify which register was to be selected. Unfortunately,

these savings come at a price. The decode circuit adds delay in the critical path. It

means that whichever register is to enable its data onto the B bus will receive that

command slightly later and will get its data on the bus slightly later. This effect

cascades, with the ALU receiving its inputs a little later and producing its results a

little later. Finally, the result is available on the C bus to be written to the registers

a little later. Since this delay often is the factor that determines how long the clock

cycle must be, this may mean that the clock cannot run quite as fast, and the entire

computer must run a little slower. Thus there is a trade-off between speed and

cost. Reducing the control store by 5 bits per word comes at the cost of slowing

down the clock. The design engineer must take the design objectives into account

when deciding which is the right choice. For a high-performance implementation,

using a decoder is probably not a good idea; for a low-cost one, it might be.

The Mic-1 was designed to be both moderately simple and moderately fast, al-

though there is admittedly an enormous tension between these two goals. Briefly

stated, simple machines are not fast and fast machines are not simple. The Mic-1

CPU also uses a minimum amount of hardware: 10 registers, the simple ALU of

Fig. 3-19 replicated 32 times, a shifter, a decoder, a control store, and a bit of glue

here and there. The whole system could be built with fewer than 5000 transistors

plus whatever the control store (ROM) and main memory (RAM) take.

Having seen how IJVM can be implemented in a straightforward way in

microcode with little hardware, let us now look at alternative, faster imple-

mentations. We will next look at ways to reduce the number of microinstructions

per ISA instruction (i.e., reducing the execution path length). After that, we will

consider other approaches.

Merging the Interpreter Loop with the Microcode

In the Mic-1, the main loop consists of one microinstruction that must be ex-

ecuted at the beginning of every IJVM instruction. In some cases it is possible to

overlap it with the previous instruction. In fact, this has already been partially

accomplished. Notice that when Main1 is executed, the opcode to be interpreted is

already in MBR . It is there because it was fetched either by the previous main loop

(if the previous instruction had no operands) or during the execution of the previ-

ous instruction.

Search WWH ::

Custom Search

Home