THE DIGITAL LOGIC LEVEL - Structured Computer Organization

Hardware Reference

In-Depth Information

Despite its bent for joule-frugal operation, the ARM A9 cores utilize a very ca-

pable microarchitecture. They can decode and execute up to two instructions each

cycle. As we will learn in Chap. 4, this execution rate represents the maximum

throughput of the microarchitecture. But do not expect it to execute this many in-

structions each cycle. Rather, think of this rate as the manufacturer's guaranteed

maximum performance, a level that the processor will never exceed, no matter

what. In many cycles, fewer than two instructions will execute due to the myriad of

''hazards'' that can stall instructions, leading to lower execution throughput. To ad-

dress many of these throughput limiters, the ARM A9 incorporates a powerful

branch predictor, out-of-order instruction scheduling, and a highly optimized mem-

ory system.

The OMAP4430's memory system has two main internal L1 caches for each

ARM A9 processor: a 32-KB cache for instructions and a 32-KB cache for data.

Like the Core i7, it also uses an on-chip level 2 (L2) cache, but unlike the Core i7,

it is a relatively tiny 1 MB in size, and it is shared by both ARM A9 cores. The

caches are fed with dual LPDDR2 low-power DRAM channels. LPDDR2 is

derived from the DDR2 memory interface standard, but changed to require fewer

wires and to operate at more power-efficient voltages. Additionally, the memory

controller incorporates a number of memory-access optimizations, such as tiled

memory prefetching and in-memory rotation support.

While we will discuss caching in detail in Chap. 4, a few words about it here

will be useful. All of main memory is divided up into cache lines (blocks) of 32

bytes. The 1024 most heavily used instruction lines and the 1024 most heavily

used data lines are in the level 1 cache. Cache lines that are heavily used but which

do not fit in the level 1 cache are kept in the level 2 cache. This cache contains

both data lines and instruction lines from both ARM A9 CPUs mixed at random.

The level 2 cache contains the most recently touched 32,768 lines in main memory.

On a level 1 cache miss, the CPU sends the identifier of the line it is looking

for (Tag address) to the level 2 cache. The reply (Tag data) provides the infor-

mation for the CPU to tell whether the line is in the level 2 cache, and if so, what

state it is in. If the line is cached there, the CPU goes and gets it. Getting a value

out of the level 2 cache takes 19 cycles. This is a long time to wait for data, so

clever programmers will optimize their programs to use less data, making it more

likely to find data in the fast level 1 cache.

If the cache line is not in the level 2 cache, it must be fetched from main mem-

ory via the LPDDR2 memory interface. The OMAP4430 LPDDR2 interface is

implemented on-chip such that LPDDR2 DRAM can be connected directly to the

OMAP4430. To access memory, the CPU must first send the upper portion of the

DRAM address to the DRAM chip, using the 13 address lines. This operation, cal-

led an ACTIVATE , loads an entire row of memory within the DRAM into a row

buffer. Subsequently, the CPU can issue multiple READ or WRITE commands, send-

ing the remainder of the address on the same 13 address lines, and sending (or re-

ceiving) the data for the operation on the 32 data lines.

Structured Computer Organization

Search WWH ::

Custom Search

Home