Graphics Reference
In-Depth Information
performance of well-engineered components to increase at rates approaching
100% per year.
GPU designers have made excellent use of increases in both transistor count
and circuit speed. Over the ten years between 2001 and 2010 performance metrics
of NVIDIA GPUs, such as triangles drawn per second and pixels drawn per sec-
ond, have increased by 70% to 90% per year. Memory bandwidth has increased
by only 50% per year. But data compression techniques, themselves enabled by
increased circuit complexity, have allowed an increase in effective memory band-
width of 80% per year, supporting the correspondingly large increases in drawing
performance.
CPU designers have taken full advantage of increased circuit speed, but they
have been less successful converting increased transistor count into performance.
CPU performance has historically increased by 50% per year, an amazing achieve-
ment, but still significantly lower than the 70% to 90% annual increases in GPU
performance. Today's relative performance advantage of GPUs over CPUs is the
direct result of compounding performance at unequal rates.
Shortly after the year 2000, CPUs reached a power dissipation somewhat over
100 watts, near the maximum that can be dissipated by a single component in a
personal computer cabinet. Because circuit power is directly proportional to cir-
cuit speed, the annual 20% increase in CPU clock speed, which had been a major
driver of CPU performance increases for two decades, dropped suddenly to essen-
tially zero. This event has motivated CPU designers to incorporate more paral-
lelism into their circuits, an approach that has been very successful for GPUs. The
Intel Core 2 Extreme QX9770 CPU is a quad-core design, meaning that it contains
four microprocessor cores in a single component package. Dual-core Intel CPUs
were introduced in 2005, and quad-core designs are now available. Each core has
four floating-point Arithmetic and Logic Units (ALUs), each ALU including an
addition unit and a multiplication unit, for a total of 32 floating-point units in the
Core 2 Extreme QX9770. By comparison, the NVIDIA GeForce 9800 GTX GPU
has 16 cores, each with eight floating-point ALUs, and each ALU with two multi-
plication units and one addition unit, for a total of 384 floating-point units. While
the CPU cores are clocked at roughly twice the rate of the GPU cores (3.2 GHz
to 1.5 Ghz), the sheer number of floating-point units in the GPU gives it a greater
than five-to-one GFLOPS advantage (576 to 102).
In summary, as of 2009 GPUs sustained significantly higher performance than
CPUs because they were doing more calculations in parallel, and they did this by
devoting a greater percentage of their silicon area to computation than do CPUs.
As of 2013, there's no sign of this trend slowing down. GPU parallelism is con-
sidered further in the following sections.
38.3 Architecture and Implementation
When you write code that uses a GPU for graphics, you do not directly manipulate
the hardware circuits of the GPU. Instead, you code to an abstraction, which is
implemented by a combination of the GPU hardware, GPU firmware (code that
runs on the GPU), and a device driver (code that runs on the CPU). The firmware
and the device driver are implemented and maintained by the GPU's manufacturer,
which is NVIDIA in the case of the GeForce 9800 GTX. They are as fundamental
to the implementation of the abstraction as the GPU hardware itself.
 
 
Search WWH ::




Custom Search