Hardware Reference
In-Depth Information
perhaps even performing asynchronously with respect to the rest of the CPU and
fetching one or more instructions ahead.
One of the most time-consuming phases of the execution of many instructions
is fetching a 2-byte offset, extending it appropriately, and accumulating it in the H
register in preparation for an addition, for example, in a branch to PC
n bytes.
One potential solution—making the memory port 16 bits wide—greatly compli-
cates the operation, because the memory is actually 32 bits wide. The 16 bits
needed might span word boundaries, so that even a single read of 32 bits will not
necessarily fetch both bytes needed.
Overlapping the execution of instructions is by far the most interesting ap-
proach and offers the most opportunity for dramatic increases in speed. Simple
overlap of instruction fetch and execution is surprisingly effective. More sophisti-
cated techniques go much further, however, overlapping execution of many instruc-
tions. In fact this idea is at the heart of modern computer design. We will discuss
some of the basic techniques for overlapping instruction execution below and moti-
vate some of the more sophisticated ones.
Speed is half the picture; cost is the other half. Cost can also be measured in a
variety of ways, but a precise definition of cost is problematic. Some measures are
as simple as a count of the number of components. This was particularly true in
the days when processors were built of discrete components that were purchased
and assembled. Today, the entire processor exists on a single chip, but bigger,
more complex chips are much more expensive than smaller, simpler ones. Individ-
ual components—for example, transistors, gates, or functional units—can be
counted, but often the count is not as important as the amount of area required on
the integrated circuit. The more area required for the functions included, the larger
the chip. And the manufacturing cost of the chip grows much faster than its area.
For this reason, designers often speak of cost in terms of ''real estate,'' that is, the
area required for a circuit (presumably measured in pico-acres).
One of the most thoroughly studied circuits in history is the binary adder.
There have been thousands of designs, and the fastest ones are much quicker than
the slowest ones. They are also far more complex. The system designer has to
decide whether the greater speed is worth the real estate.
Adders are not the only component with many choices. Nearly every compo-
nent in the system can be designed to run faster or slower, with a cost differential.
The challenge to the designer is to identify the components in the system to speed
up in order to improve the system the most. Interestingly enough, many an indi-
vidual component can be replaced with a much faster component with little or no
effect on speed. In the following sections we will look at some of the design issues
and the corresponding trade-offs.
A key factor in determining how fast the clock can run is the amount of work
that must be done on each clock cycle. Obviously, the more work to be done, the
longer the clock cycle. It's not quite that simple, of course, because the hardware
is quite good at doing things in parallel, so it's actually the sequence of operations
±
Search WWH ::




Custom Search