Modern Graphics Hardware - Computer Graphics: Principles and Practice

Graphics Reference

In-Depth Information

performance of well-engineered components to increase at rates approaching

100% per year.

GPU designers have made excellent use of increases in both transistor count

and circuit speed. Over the ten years between 2001 and 2010 performance metrics

of NVIDIA GPUs, such as triangles drawn per second and pixels drawn per sec-

ond, have increased by 70% to 90% per year. Memory bandwidth has increased

by only 50% per year. But data compression techniques, themselves enabled by

increased circuit complexity, have allowed an increase in effective memory band-

width of 80% per year, supporting the correspondingly large increases in drawing

performance.

CPU designers have taken full advantage of increased circuit speed, but they

have been less successful converting increased transistor count into performance.

CPU performance has historically increased by 50% per year, an amazing achieve-

ment, but still significantly lower than the 70% to 90% annual increases in GPU

performance. Today's relative performance advantage of GPUs over CPUs is the

direct result of compounding performance at unequal rates.

Shortly after the year 2000, CPUs reached a power dissipation somewhat over

100 watts, near the maximum that can be dissipated by a single component in a

personal computer cabinet. Because circuit power is directly proportional to cir-

cuit speed, the annual 20% increase in CPU clock speed, which had been a major

driver of CPU performance increases for two decades, dropped suddenly to essen-

tially zero. This event has motivated CPU designers to incorporate more paral-

lelism into their circuits, an approach that has been very successful for GPUs. The

Intel Core 2 Extreme QX9770 CPU is a quad-core design, meaning that it contains

four microprocessor cores in a single component package. Dual-core Intel CPUs

were introduced in 2005, and quad-core designs are now available. Each core has

four floating-point Arithmetic and Logic Units (ALUs), each ALU including an

addition unit and a multiplication unit, for a total of 32 floating-point units in the

Core 2 Extreme QX9770. By comparison, the NVIDIA GeForce 9800 GTX GPU

has 16 cores, each with eight floating-point ALUs, and each ALU with two multi-

plication units and one addition unit, for a total of 384 floating-point units. While

the CPU cores are clocked at roughly twice the rate of the GPU cores (3.2 GHz

to 1.5 Ghz), the sheer number of floating-point units in the GPU gives it a greater

than five-to-one GFLOPS advantage (576 to 102).

In summary, as of 2009 GPUs sustained significantly higher performance than

CPUs because they were doing more calculations in parallel, and they did this by

devoting a greater percentage of their silicon area to computation than do CPUs.

As of 2013, there's no sign of this trend slowing down. GPU parallelism is con-

sidered further in the following sections.

38.3 Architecture and Implementation

When you write code that uses a GPU for graphics, you do not directly manipulate

the hardware circuits of the GPU. Instead, you code to an abstraction, which is

implemented by a combination of the GPU hardware, GPU firmware (code that

runs on the GPU), and a device driver (code that runs on the CPU). The firmware

and the device driver are implemented and maintained by the GPU's manufacturer,

which is NVIDIA in the case of the GeForce 9800 GTX. They are as fundamental

to the implementation of the abstraction as the GPU hardware itself.

Computer Graphics: Principles and Practice

Search WWH ::

Custom Search

Home