Modern Graphics Hardware - Computer Graphics: Principles and Practice

Graphics Reference

In-Depth Information

• Latency hiding: Like the Core 2 Extreme QX9770 CPU, Larrabee relies

more heavily on its large, hierarchical caches to hide memory latency. 19

Each Larrabee core supports only four threads, against the 48 threads that

implement the GeForce 9800 GTX's block multithreading. 20

• Cache coherence: Larrabee's cache hierarchy maintains full coherence,

just as the Core 2 Extreme QX9770 CPU's caches do (though at some-

what higher cost, since Larrabee has at least four times as many caches at

each level of the hierarchy). Cache coherence is not implemented by the

GeForce 9800 GTX.

Larrabee's strong implementation parallels to the GeForce 9800 GTX GPU

are not matched in terms of architecture, where parallels to the Core 2 Extreme

QX9770 CPU dominate.

• ISA: The Larrabee and Core 2 Extreme QX9770 cores share Intel's IA-

32 instruction set architecture (ISA), though Larrabee's ISA is augmented

with vector instructions. While the GeForce 9800 GTX cores are pro-

grammable, their ISA is hidden beneath the OpenGL and Direct3D shader

languages. Both Larrabee and Core 2 Extreme QX9770 can be pro-

grammed to support OpenGL and Direct3D, and in fact Intel engineers

ported both APIs to Larrabee (using binned rendering!), but these APIs are

not architectural. Indeed, there is no evidence of a pipeline model in either

CPU-like architecture.

• SPMD: Like the Core 2 Extreme QX9770, Larrabee cores are exposed in

the ISA as scalar with vector instructions. Each core has 17 ALUs: one

scalar and 16 vector. Programmers use a combination of scalar and vector

instructions to explicitly manage control flow (a single program counter for

the entire core) and vector operations on data. Predication is also explicit—

programmers manipulate a mask to control which vector elements are com-

mitted. The GeForce 9800 GTX hides all this complexity behind its SPMD

architecture, allowing programmers to treat each vector element as an indi-

vidual execution unit with full flow control (its own program counter) and

data execution capability.

Larrabee's CPU-like architecture gives it

important advantages over the

GeForce 9800 GTX's SPMD architecture.

• Flexibility: Larrabee executes arbitrary graphics algorithms with equal

efficiency, because it isn't optimized for a particular algorithm, as the

GeForce 9800 GTX is for the “graphics pipeline.”

• Generality: Larrabee executes arbitrary non graphics algorithms efficiently

too.

• Capability: Like the Core 2 Extreme QX9770, Larrabee can run system-

level programs up to and including operating systems. The GeForce 9800

GTX cannot.

19. Larrabee's cache sizes are 64 Kbytes $L1 and 256 Kbytes $L2. While NVIDIA does

not publish figures for the GeForce 9800 GTX, they are almost certainly much lower.

20. A technique called software fibers —explicitly coding a multithread instruction

sequence into a single thread—is also used to hide latency. It lacks the GPU capability

of block scheduling, however, and is therefore less effective.

Computer Graphics: Principles and Practice

Search WWH ::

Custom Search

Home