Hardware Reference
In-Depth Information
instruction set. They do not have to be broken up. They can be executed as is,
each in a single data path cycle.
In contrast to the Core i7 and the OMAP4430, the ATmega168 is a simple ma-
chine indeed. It is more RISC like than CISC like because most of its simple in-
structions can be executed in one clock cycle and do not need to be decomposed.
It has no pipelining and no caching, and it has in-order issue, in-order execute, and
in-order retirement. In its simplicity, it is much akin to the Mic-1.
4.8 SUMMARY
The heart of every computer is the data path. It contains some registers, one,
two or three buses, and one or more functional units such as ALUs and shifters.
The main execution loop consists of fetching some operands from the registers and
sending them over the buses to the ALU and other functional unit for execution.
The results are then stored back in the registers.
The data path can be controlled by a sequencer that fetches microinstructions
from a control store. Each microinstruction contains bits that control the data path
for one cycle. These bits specify which operands to select, which operation to per-
form, and what to do with the results. In addition, each microinstruction specifies
its successor, typically explicitly by containing its address. Some microinstruc-
tions modify this base address by ORing bits into the address before it is used.
The IJVM machine is a stack machine with 1-byte opcodes that push words
onto the stack, pop words from the stack, and combine (e.g., add) words on the
stack. A microprogrammed implementation was given for the Mic-1 microarchi-
tecture. By adding an instruction fetch unit to preload the bytes in the instruction
stream, many references to the program counter could be eliminated and the ma-
chine greatly speeded up.
There are many ways to design the microarchitecture level. Many trade-offs
exist, including two-bus versus three-bus designs, encoded versus decoded micro-
instruction fields, presence or absence of prefetching, shallow or deep pipelines,
and much more. The Mic-1 is a simple, software-controlled machine with sequen-
tial execution and no parallelism. In contrast, the Mic-4 is a highly parallel
microarchitecture with a seven-stage pipeline.
Performance can be improved in a variety of ways. Cache memory is a major
one. Direct-mapped caches and set-associative caches are commonly used to
speed up memory references. Branch prediction, both static and dynamic, is im-
portant, as are out-of-order execution, and speculative execution.
Our three example machines, the Core i7, OMAP4430, and ATmega168, all
have microarchitectures not visible to the ISA assembly-language programmers.
The Core i7 has a complex scheme for converting the ISA instructions into
micro-operations, caching them, and feeding them into a superscalar RISC core for
out-of-order execution, register renaming, and every other trick in the topic to get
 
Search WWH ::




Custom Search