THE MICROARCHITECTURE LEVEL - Structured Computer Organization

Hardware Reference

In-Depth Information

perhaps even performing asynchronously with respect to the rest of the CPU and

fetching one or more instructions ahead.

One of the most time-consuming phases of the execution of many instructions

is fetching a 2-byte offset, extending it appropriately, and accumulating it in the H

register in preparation for an addition, for example, in a branch to PC

n bytes.

One potential solution—making the memory port 16 bits wide—greatly compli-

cates the operation, because the memory is actually 32 bits wide. The 16 bits

needed might span word boundaries, so that even a single read of 32 bits will not

necessarily fetch both bytes needed.

Overlapping the execution of instructions is by far the most interesting ap-

proach and offers the most opportunity for dramatic increases in speed. Simple

overlap of instruction fetch and execution is surprisingly effective. More sophisti-

cated techniques go much further, however, overlapping execution of many instruc-

tions. In fact this idea is at the heart of modern computer design. We will discuss

some of the basic techniques for overlapping instruction execution below and moti-

vate some of the more sophisticated ones.

Speed is half the picture; cost is the other half. Cost can also be measured in a

variety of ways, but a precise definition of cost is problematic. Some measures are

as simple as a count of the number of components. This was particularly true in

the days when processors were built of discrete components that were purchased

and assembled. Today, the entire processor exists on a single chip, but bigger,

more complex chips are much more expensive than smaller, simpler ones. Individ-

ual components—for example, transistors, gates, or functional units—can be

counted, but often the count is not as important as the amount of area required on

the integrated circuit. The more area required for the functions included, the larger

the chip. And the manufacturing cost of the chip grows much faster than its area.

For this reason, designers often speak of cost in terms of ''real estate,'' that is, the

area required for a circuit (presumably measured in pico-acres).

One of the most thoroughly studied circuits in history is the binary adder.

There have been thousands of designs, and the fastest ones are much quicker than

the slowest ones. They are also far more complex. The system designer has to

decide whether the greater speed is worth the real estate.

Adders are not the only component with many choices. Nearly every compo-

nent in the system can be designed to run faster or slower, with a cost differential.

The challenge to the designer is to identify the components in the system to speed

up in order to improve the system the most. Interestingly enough, many an indi-

vidual component can be replaced with a much faster component with little or no

effect on speed. In the following sections we will look at some of the design issues

and the corresponding trade-offs.

A key factor in determining how fast the clock can run is the amount of work

that must be done on each clock cycle. Obviously, the more work to be done, the

longer the clock cycle. It's not quite that simple, of course, because the hardware

is quite good at doing things in parallel, so it's actually the sequence of operations

±

Structured Computer Organization

Search WWH ::

Custom Search

Home