Hardware Reference
In-Depth Information
of where it is. If an interrupt occurs, instructions not yet retired are aborted, so the
Core i7 has ''precise interrupts'' so that upon an interrupt, all instructions up to a
certain point have been completed and no instruction beyond that has any effect.
If a store instruction has been retired, but earlier instructions are still in
progress, the L1 cache cannot be updated, so the results are put into a special pend-
ing-store buffer. This buffer has 36 entries, corresponding to the 36 stores that
might be in execution at once. If a subsequent load tries to read the stored data, it
can be passed from the pending-store buffer to the instruction, even though it is not
yet in the L1 data cache. This process is called store-to-load forwarding. While
this forwarding mechanism may seem straightforward, in practice it is quite com-
plicated to implement because intervening stores may not have yet computed their
addresses. In this case, the microarchitecture cannot definitely know which store in
the store buffer will produce the needed value. The process of determining which
store provides the value for a load is called disambiguation .
It should be clear by now that the Core i7 has a highly complex microarchitec-
ture whose design was driven by the need to execute the old Pentium instruction
set on a modern, highly pipelined RISC core. It accomplishes this goal by break-
ing Pentium instructions into micro-ops, caching them, and feeding them into the
pipeline four at time for execution on a set of ALUs capable of executing up to six
micro-ops per cycle under optimal conditions. Micro-ops are executed out of order
but retired in order, and results are stored into the L1 and L2 caches in order.
4.6.2 The Microarchitecture of the OMAP4430 CPU
At the heart of the OMAP4430 system-on-a-chip are two ARM Cortex A9
processors. The Cortex A9 is a high-performance microarchitecture that imple-
ments the ARM instruction set (version 7). The processor was designed by ARM
Ltd. and it is included with slight variations in a wide variety of embedded devices.
ARM does not manufacture the processor, it only supplies the design to silicon
manufacturers that want to incorporate it into their system-on-a-chip design (Texas
Instruments, in this case).
The Cortex A9 processor is a 32-bit machine, with 32-bit registers and a 32-bit
data path. Like the internal architecture, the memory bus is 32 bits wide. Unlike the
Core i7, the Cortex A9 is a true RISC architecture, which means that it does not
need a complex mechanism to convert old CISC instructions into micro-ops for ex-
ecution. The core instructions are in fact already micro-op like ARM instructions.
However, in recent years, more complex graphics and multimedia instructions have
been added, requiring special hardware facilities for their execution.
Overview of the OMAP4430's Cortex A9 Microarchitecture
The block diagram of the Cortex A9 microarchitecture is given in Fig. 4-48.
On the whole, it is much simpler than the Core i7's Sandy Bridge microarchitec-
ture because it has a simpler ISA architecture to implement. Nevertheless, some of
 
 
Search WWH ::




Custom Search