Hardware Reference
In-Depth Information
compatible with the 8088 and can run unmodified 8088 binary programs (not to
mention programs for all the intermediate processors as well).
From a software point of view, the Core i7 is a full 64-bit machine. It has all
the same user-level ISA features as the 80386, 80486, Pentium, Pentium II, Pen-
tium Pro, Pentium III, and Pentium 4 including the same registers, same instruc-
tions, and a full on-chip implementation of the IEEE 754 floating-point standard.
In addition, it has some new instructions intended primarily for cryptographic op-
erations.
The Core i7 processor is a multicore CPU, thus the silicon die contains multi-
ple processors. The CPU is sold with a varying number of processors, ranging
from 2 to 6 with more planned for the near future. If programmers write a parallel
program, using threads and locks, it is possible to gain significant program
speedups by exploiting parallelism on multiple processors. In addition, the individ-
ual CPUs are ''hyperthreaded'' such that multiple hardware threads can be active
simultaneously. Hyperthreading (more typically called ''simultaneous multithread-
ing'' by computer architects) allows very short latencies, such as cache misses, to
be tolerated with hardware thread switches. Software-based threading can tolerate
only very long latencies, such as page faults, due to the hundreds of cycles needed
to implement software-based thread switches.
Internally, at the microarchitecture level, the Core i7 is a very capable design.
It is based on the architecture of its predecessors, the Core 2 and Core 2 Duo. The
Core i7 processor can carry out up to four instructions at once, making it a 4-wide
superscalar machine. We will examine the microarchitecture in Chap. 4.
The Core i7 processors all have three levels of cache. Each processor in a Core
i7 processor has a 32-KB level 1 (L1) data cache and a 32-KB level 1 instruction
cache. Each core also has its own 256-KB level 2 (L2) cache. The second-level
cache is unified, which means that it can hold a mixture of instructions and data.
All cores share a single level 3 (L3) unified cache, the size of which varies from 4
to 15 MB depending on the processor model. Having three levels of cache signifi-
cantly improves processor performance but at a great cost in silicon area, as Core
i7 CPUs can have as much as 17 MB total cache on a single silicon die.
Since all Core i7 chips have multiple processors with private data caches, a
problem arises when a processor modifies a word in this private cache that is con-
tained in another processor's cache. If the other processor tries to read that word
from memory, it will get a stale value, since modified cache words are not written
back to memory immediately. To maintain memory consistency, each CPU in a
multiprocessor system snoops on the memory bus looking for references to words
it has cached. When it sees such a reference, it jumps in and supplies the required
data before the memory gets a chance to do so. We will study snooping in Chap. 8.
Two primary external buses are used in Core i7 systems, both of them syn-
chronous. A DDR3 memory bus is used to access the main memory DRAM, and a
PCI Express bus connects the processor to I/O devices. High-end versions of the
Core i7 include multiple memory and PCI Express buses, and they also include a
 
Search WWH ::




Custom Search