Hardware Reference
In-Depth Information
• 8KB or 16KB L1 cache plus 12K micro-op execution trace cache.
• 256KB, 512KB, 1MB, or 2MB of on-die, full-core speed 256-bit-wide L2 cache with
eight-way associativity.
• L2 cache that can handle all physical memory and supports ECC.
• 2MB of on-die, full-speed L3 cache (Extreme Edition).
• SSE2-SSE plus 144 new instructions for graphics and sound processing (Willamette
and Northwood).
• SSE3-SSE2 plus 13 new instructions for graphics and sound processing (Prescott).
• Enhanced floating-point unit.
• Multiple low-power states.
See IA-32e64-BitExtensionMode(AMD64,x86-64,EM64T) ,” p. 47 (thischapter).
Intel abandoned Roman numerals for a standard Arabic numeral 4 designation to identify
the Pentium 4. Internally, the Pentium 4 introduces a new architecture that Intel calls
NetBurst microarchitecture ,whichisamarketingtermandnotatechnicalterm.Inteluses
NetBurst to describe hyper-pipelined technology, a rapid execution engine, a high-speed
(400MHz, 533MHz, 800MHz, or 1,066MHz) system bus, and an execution trace cache.
Thehyper-pipelinedtechnologydoublesortriplestheinstructionpipelinedepthcompared
to the Pentium III (or Athlon/Athlon 64), meaning more and smaller steps are required to
execute instructions. Even though this might seem less efficient, it enables much higher
clock speeds to be more easily attained. The rapid execution engine enables the two in-
teger ALUs to run at twice the processor core frequency, which means instructions can
execute in half a clock cycle. The 400MHz/533MHz/800MHz/1,066MHz system bus is a
quad-pumped bus running off a 100MHz/133MHz/200MHz/266MHz system clock trans-
ferring data four times per clock cycle. The execution trace cache is a high-performance
Level 1 cache that stores approximately 12K decoded micro-operations. This removes the
instruction decoder from the main execution pipeline, increasing performance.
Of these, the high-speed processor bus is most notable. Technically speaking, the pro-
cessor bus is a 100MHz, 133MHz, 200MHz, or 266MHz quad-pumped bus that transfers
four times per cycle (4x), for a 400MHz, 533MHz, 800MHz, or 1,066MHz effective rate.
Because the bus is 64 bits (8 bytes) wide, this results in a throughput rate of 3,200MBps,
4,266MBps, 6,400MBps, or 8,533MBps.
In the Pentium 4's 20-stage or 31-stage pipelined internal architecture, individual instruc-
tionsarebrokendownintomanymoresubstagesthanwithpreviousprocessorssuchasthe
Pentium III, making this almost like a RISC processor. Unfortunately, this can add to the
numberofcyclestakentoexecute instructions iftheyarenotoptimized forthisprocessor.
Search WWH ::




Custom Search