Hardware Reference
In-Depth Information
• Software compatible with previous Intel 32-bit processors.
• Some versions supporting EM64T (64-bit extensions) and Execute Disable Bit (buffer
overflow protection).
• Processor (front-side) bus that runs at 400MHz, 533MHz, 800MHz, or 1,066MHz.
• Arithmetic logic units (ALUs) that run at twice the processor core frequency.
• Hyper-pipelined (20-stage or 31-stage) technology.
• HT Technology support in all 2.4GHz and faster processors running an 800MHz bus and all
3.06GHz and faster processors running a 533MHz bus.
• Deep out-of-order instruction execution.
• Enhanced branch prediction.
• 8KB or 16KB L1 cache plus 12K micro-op execution trace cache.
• 256KB, 512KB, 1MB, or 2MB of on-die, full-core speed 256-bit-wide L2 cache with eight-
way associativity.
• L2 cache that can handle all physical memory and supports ECC.
• 2MB of on-die, full-speed L3 cache (Extreme Edition).
• SSE2-SSE plus 144 new instructions for graphics and sound processing (Willamette and
Northwood).
• SSE3-SSE2 plus 13 new instructions for graphics and sound processing (Prescott).
• Enhanced floating-point unit.
• Multiple low-power states.
Intel abandoned Roman numerals for a standard Arabic numeral 4 designation to identify the Pentium
4. Internally, the Pentium 4 introduces a new architecture that Intel calls NetBurst microarchitecture ,
which is a marketing term and not a technical term. Intel uses NetBurst to describe hyper-pipelined
technology, a rapid execution engine, a high-speed (400MHz, 533MHz, 800MHz, or 1,066MHz)
system bus, and an execution trace cache. The hyper-pipelined technology doubles or triples the
instruction pipeline depth compared to the Pentium III (or Athlon/Athlon 64), meaning more and
smaller steps are required to execute instructions. Even though this might seem less efficient, it
enables much higher clock speeds to be more easily attained. The rapid execution engine enables the
two integer ALUs to run at twice the processor core frequency, which means instructions can execute
in half a clock cycle. The 400MHz/533MHz/800MHz/1,066MHz system bus is a quad-pumped bus
running off a 100MHz/133MHz/200MHz/266MHz system clock transferring data four times per clock
cycle. The execution trace cache is a high-performance Level 1 cache that stores approximately 12K
decoded micro-operations. This removes the instruction decoder from the main execution pipeline,
increasing performance.
Of these, the high-speed processor bus is most notable. Technically speaking, the processor bus is a
100MHz, 133MHz, 200MHz, or 266MHz quad-pumped bus that transfers four times per cycle (4x),
for a 400MHz, 533MHz, 800MHz, or 1,066MHz effective rate. Because the bus is 64 bits (8 bytes)
wide, this results in a throughput rate of 3,200MBps, 4,266MBps, 6,400MBps, or 8,533MBps.
In the Pentium 4's 20-stage or 31-stage pipelined internal architecture, individual instructions are
broken down into many more substages than with previous processors such as the Pentium III, making
this almost like a RISC processor. Unfortunately, this can add to the number of cycles taken to execute
 
Search WWH ::




Custom Search