Hardware Reference
In-Depth Information
Processor cache coherency is a critical subject in the age of multicore processors, and we will
examine it in detail in Chapter 5 .
2.6 Putting It All Together: Memory Hierachies in the
ARM Cortex-A8 and Intel Core i7
This section reveals the ARM Cortex-A8 (hereafter called the Cortex-A8) and Intel Core i7
(hereafter called i7) memory hierarchies and shows the performance of their components on
a set of single threaded benchmarks. We examine the Cortex-A8 first because it has a simpler
memory system; we go into more detail for the i7, tracing out a memory reference in detail.
This section presumes that readers are familiar with the organization of a two-level cache hier-
archy using virtually indexed caches. The basics of such a memory system are explained in
detail in Appendix B , and readers who are uncertain of the organization of such a system are
strongly advised to review the Opteron example in Appendix B . Once they understand the
organization of the Opteron, the brief explanation of the Cortex-A8 system, which is similar,
will be easy to follow.
The ARM Cortex-A8
The Cortex-A8 is a configurable core that supports the ARMv7 instruction set architecture. It
is delivered as an IP (Intellectual Property) core. IP cores are the dominant form of technology
delivery in the embedded, PMD, and related markets; billions of ARM and MIPS processors
have been created from these IP cores. Note that IP cores are different than the cores in the
Intel i7 or AMD Athlon multicores. An IP core (which may itself be a multicore) is designed to
be incorporated with other logic (hence it is the core of a chip), including application-specific
processors (such as an encoder or decoder for video), I/O interfaces, and memory interfaces,
and then fabricated to yield a processor optimized for a particular application. For example,
the Cortex-A8 IP core is used in the Apple iPad and smartphones by several manufacturers in-
cluding Motorola and Samsung. Although the processor core is almost identical, the resultant
chips have many diferences.
Generally, IP cores come in two flavors. Hard cores are optimized for a particular semicon-
ductor vendor and are black boxes with external (but still on-chip) interfaces. Hard cores typ-
ically allow parametrization only of logic outside the core, such as L2 cache sizes, and the IP
core cannot be modified. Soft cores are usually delivered in a form that uses a standard library
of logic elements. A soft core can be compiled for different semiconductor vendors and can
also be modified, although extensive modifications are very difficult due to the complexity of
modern-day IP cores. In general, hard cores provide higher performance and smaller die area,
while soft cores allow retargeting to other vendors and can be more easily modified.
The Cortex-A8 can issue two instructions per clock at clock rates up to 1GHz. It can support
a two-level cache hierarchy with the first level being a pair of caches (for I & D), each 16 KB
or 32 KB organized as four-way set associative and using way prediction and random replace-
ment. The goal is to have single-cycle access latency for the caches, allowing the Cortex-A8
to maintain a load-to-use delay of one cycle, simpler instruction fetch, and a lower penalty
for fetching the correct instruction when a branch miss causes the wrong instruction to be
prefetched. The optional second-level cache when present is eight-way set associative and can
be configured with 128 KB up to 1 MB; it is organized into one to four banks to allow sev-
eral transfers from memory to occur concurrently. An external bus of 64 to 128 bits handles
 
Search WWH ::




Custom Search