Graphics Reference
In-Depth Information
Latency hiding: Like the Core 2 Extreme QX9770 CPU, Larrabee relies
more heavily on its large, hierarchical caches to hide memory latency. 19
Each Larrabee core supports only four threads, against the 48 threads that
implement the GeForce 9800 GTX's block multithreading. 20
Cache coherence: Larrabee's cache hierarchy maintains full coherence,
just as the Core 2 Extreme QX9770 CPU's caches do (though at some-
what higher cost, since Larrabee has at least four times as many caches at
each level of the hierarchy). Cache coherence is not implemented by the
GeForce 9800 GTX.
Larrabee's strong implementation parallels to the GeForce 9800 GTX GPU
are not matched in terms of architecture, where parallels to the Core 2 Extreme
QX9770 CPU dominate.
ISA: The Larrabee and Core 2 Extreme QX9770 cores share Intel's IA-
32 instruction set architecture (ISA), though Larrabee's ISA is augmented
with vector instructions. While the GeForce 9800 GTX cores are pro-
grammable, their ISA is hidden beneath the OpenGL and Direct3D shader
languages. Both Larrabee and Core 2 Extreme QX9770 can be pro-
grammed to support OpenGL and Direct3D, and in fact Intel engineers
ported both APIs to Larrabee (using binned rendering!), but these APIs are
not architectural. Indeed, there is no evidence of a pipeline model in either
CPU-like architecture.
SPMD: Like the Core 2 Extreme QX9770, Larrabee cores are exposed in
the ISA as scalar with vector instructions. Each core has 17 ALUs: one
scalar and 16 vector. Programmers use a combination of scalar and vector
instructions to explicitly manage control flow (a single program counter for
the entire core) and vector operations on data. Predication is also explicit—
programmers manipulate a mask to control which vector elements are com-
mitted. The GeForce 9800 GTX hides all this complexity behind its SPMD
architecture, allowing programmers to treat each vector element as an indi-
vidual execution unit with full flow control (its own program counter) and
data execution capability.
Larrabee's CPU-like architecture gives it
important advantages over the
GeForce 9800 GTX's SPMD architecture.
Flexibility: Larrabee executes arbitrary graphics algorithms with equal
efficiency, because it isn't optimized for a particular algorithm, as the
GeForce 9800 GTX is for the “graphics pipeline.”
Generality: Larrabee executes arbitrary non graphics algorithms efficiently
too.
Capability: Like the Core 2 Extreme QX9770, Larrabee can run system-
level programs up to and including operating systems. The GeForce 9800
GTX cannot.
19. Larrabee's cache sizes are 64 Kbytes $L1 and 256 Kbytes $L2. While NVIDIA does
not publish figures for the GeForce 9800 GTX, they are almost certainly much lower.
20. A technique called software fibers —explicitly coding a multithread instruction
sequence into a single thread—is also used to hide latency. It lacks the GPU capability
of block scheduling, however, and is therefore less effective.
 
Search WWH ::




Custom Search