THE MICROARCHITECTURE LEVEL - Structured Computer Organization

Hardware Reference

In-Depth Information

instruction set. They do not have to be broken up. They can be executed as is,

each in a single data path cycle.

In contrast to the Core i7 and the OMAP4430, the ATmega168 is a simple ma-

chine indeed. It is more RISC like than CISC like because most of its simple in-

structions can be executed in one clock cycle and do not need to be decomposed.

It has no pipelining and no caching, and it has in-order issue, in-order execute, and

in-order retirement. In its simplicity, it is much akin to the Mic-1.

4.8 SUMMARY

The heart of every computer is the data path. It contains some registers, one,

two or three buses, and one or more functional units such as ALUs and shifters.

The main execution loop consists of fetching some operands from the registers and

sending them over the buses to the ALU and other functional unit for execution.

The results are then stored back in the registers.

The data path can be controlled by a sequencer that fetches microinstructions

from a control store. Each microinstruction contains bits that control the data path

for one cycle. These bits specify which operands to select, which operation to per-

form, and what to do with the results. In addition, each microinstruction specifies

its successor, typically explicitly by containing its address. Some microinstruc-

tions modify this base address by ORing bits into the address before it is used.

The IJVM machine is a stack machine with 1-byte opcodes that push words

onto the stack, pop words from the stack, and combine (e.g., add) words on the

stack. A microprogrammed implementation was given for the Mic-1 microarchi-

tecture. By adding an instruction fetch unit to preload the bytes in the instruction

stream, many references to the program counter could be eliminated and the ma-

chine greatly speeded up.

There are many ways to design the microarchitecture level. Many trade-offs

exist, including two-bus versus three-bus designs, encoded versus decoded micro-

instruction fields, presence or absence of prefetching, shallow or deep pipelines,

and much more. The Mic-1 is a simple, software-controlled machine with sequen-

tial execution and no parallelism. In contrast, the Mic-4 is a highly parallel

microarchitecture with a seven-stage pipeline.

Performance can be improved in a variety of ways. Cache memory is a major

one. Direct-mapped caches and set-associative caches are commonly used to

speed up memory references. Branch prediction, both static and dynamic, is im-

portant, as are out-of-order execution, and speculative execution.

Our three example machines, the Core i7, OMAP4430, and ATmega168, all

have microarchitectures not visible to the ISA assembly-language programmers.

The Core i7 has a complex scheme for converting the ISA instructions into

micro-operations, caching them, and feeding them into a superscalar RISC core for

out-of-order execution, register renaming, and every other trick in the topic to get

Search WWH ::

Custom Search

Home