Modern Graphics Hardware - Computer Graphics: Principles and Practice

Graphics Reference

In-Depth Information

But Larrabee's flexibility, generality, and capability come at a cost: It is sub-

stantially more difficult to program than traditional GPUs such as the GeForce

9800 GTX. And, even with the heroic programming efforts of Intel's experts,

the resultant OpenGL and Direct3D implementations performed significantly less

well than competitive, traditional GPUs. Larrabee's more general architecture

(ISA versus GeForce 9800 GTX's SPMD) contributes to this situation, of course,

but the relative lack of graphics specialization in Larrabee's implementation is the

root cause. While all graphics algorithms can be run on general-purpose vector

cores, for some algorithms this is a very inefficient approach. Traditional GPUs

such as the GeForce 9800 GTX are optimized to account for this: Independent,

per-element operations with rich, application-programmable semantics such as

those on vertices, primitives, and fragments are implemented on the data-parallel

vector cores, while pipeline-specific algorithms with inherently rigid semantics

such as rasterization (fragment generation) are implemented with specialized,

fixed-function units.

Fixed-function units improve efficiency in several ways.

• Efficient parallelization: Specialized hardware can efficiently parallelize

algorithms that are not inherently data-parallel.

• Correct provisioning: Algorithmic parameters such as numeric represen-

tation and precision can be optimized when the algorithm is implemented

in task-dedicated hardware. Eight-bit integer multiplication, for example,

requires less than one-tenth the hardware of 32-bit floating-point multipli-

cation.

• Sequence optimization: When an algorithm is cast into dedicated hard-

ware, each “step” is implemented with exactly the required capability (e.g.,

addition of two values) rather than effectively consuming the full capability

of a core's ALU (addition, subtraction, multiplication, division, etc.). Fur-

thermore, the sequence of steps is managed with dedicated hardware (e.g.,

a simple finite-state machine) rather than expressed as a program running

on a core's instruction unit (whose stored-programmodel consumes expen-

sive memory bandwidth and cache hierarchy).

All together, these advantages can yield impressive savings. For example, all mod-

ern GPUs include specialized hardware for decoding video streams. 21 While video

decoding can be implemented using the data-parallel cores, it reportedly consumes

1

100th the power when run in a purpose-designed unit, allowing laptop com-

puters to display movies without quickly draining their batteries. Similar savings

ratios may be achieved by fixed-function implementations of graphics pipeline

stages.

Larrabee's designers were not unaware of these advantages—they chose, for

example, to implement texture evaluation with a purpose-built, fixed-function

unit. But overall they shifted the traditional GPU implementation balance from

specialization toward generalization, trading the resultant loss of performance

on existing applications (OpenGL and Direct3D) for the opportunity to achieve

improved performance in new areas, such as alternative graphics pipelines, and

nongraphical algorithms. In a competitive market, performing better on new appli-

cations is a valuable differentiation, but performing well on existing applications is

/

21. This hardware is not otherwise discussed in this chapter.

Computer Graphics: Principles and Practice

Search WWH ::

Custom Search

Home