Graphics Reference
In-Depth Information
But Larrabee's flexibility, generality, and capability come at a cost: It is sub-
stantially more difficult to program than traditional GPUs such as the GeForce
9800 GTX. And, even with the heroic programming efforts of Intel's experts,
the resultant OpenGL and Direct3D implementations performed significantly less
well than competitive, traditional GPUs. Larrabee's more general architecture
(ISA versus GeForce 9800 GTX's SPMD) contributes to this situation, of course,
but the relative lack of graphics specialization in Larrabee's implementation is the
root cause. While all graphics algorithms can be run on general-purpose vector
cores, for some algorithms this is a very inefficient approach. Traditional GPUs
such as the GeForce 9800 GTX are optimized to account for this: Independent,
per-element operations with rich, application-programmable semantics such as
those on vertices, primitives, and fragments are implemented on the data-parallel
vector cores, while pipeline-specific algorithms with inherently rigid semantics
such as rasterization (fragment generation) are implemented with specialized,
fixed-function units.
Fixed-function units improve efficiency in several ways.
Efficient parallelization: Specialized hardware can efficiently parallelize
algorithms that are not inherently data-parallel.
Correct provisioning: Algorithmic parameters such as numeric represen-
tation and precision can be optimized when the algorithm is implemented
in task-dedicated hardware. Eight-bit integer multiplication, for example,
requires less than one-tenth the hardware of 32-bit floating-point multipli-
cation.
Sequence optimization: When an algorithm is cast into dedicated hard-
ware, each “step” is implemented with exactly the required capability (e.g.,
addition of two values) rather than effectively consuming the full capability
of a core's ALU (addition, subtraction, multiplication, division, etc.). Fur-
thermore, the sequence of steps is managed with dedicated hardware (e.g.,
a simple finite-state machine) rather than expressed as a program running
on a core's instruction unit (whose stored-programmodel consumes expen-
sive memory bandwidth and cache hierarchy).
All together, these advantages can yield impressive savings. For example, all mod-
ern GPUs include specialized hardware for decoding video streams. 21 While video
decoding can be implemented using the data-parallel cores, it reportedly consumes
1
100th the power when run in a purpose-designed unit, allowing laptop com-
puters to display movies without quickly draining their batteries. Similar savings
ratios may be achieved by fixed-function implementations of graphics pipeline
stages.
Larrabee's designers were not unaware of these advantages—they chose, for
example, to implement texture evaluation with a purpose-built, fixed-function
unit. But overall they shifted the traditional GPU implementation balance from
specialization toward generalization, trading the resultant loss of performance
on existing applications (OpenGL and Direct3D) for the opportunity to achieve
improved performance in new areas, such as alternative graphics pipelines, and
nongraphical algorithms. In a competitive market, performing better on new appli-
cations is a valuable differentiation, but performing well on existing applications is
/
21. This hardware is not otherwise discussed in this chapter.
 
Search WWH ::




Custom Search