Modern Graphics Hardware - Computer Graphics: Principles and Practice

Graphics Reference

In-Depth Information

Read requests that are fulfilled by cached data (cache hits ) have dramatically

lower latency than those fulfilled frommain memory (cache misses ). Taking main-

memory latency as the benchmark, this disparity is desirable: If most memory

requests hit, latency is dramatically reduced. But it is tempting instead to take

the cache's hit latency as the benchmark, because this performance is achieved

asymptotically as the cache-miss rate goes to zero. Unfortunately, the large dis-

parity in latencies is undesirable from this viewpoint, because even a few misses

dramatically increase average latency. For example, the average latency of a cache

with a miss penalty of 100 x is doubled by a miss rate of only 1%. In practice, only

very large cache memories achieve average read latencies that approach this hit

latency.

Cache memory is organized into equal-size units called lines, which are typ-

ically much larger than a single data item. Transfers between the processor and

cache memory operate at the granularity of individual data items—a word is read

from a line in the cache and returned to the processor, or a byte is written from the

processor into the appropriate cache line. But transfers between cache and main

memory operate at cache-line granularity—entire cache lines are either read from

or written to main memory. Cache-line size is chosen so that these transfers make

efficient use of main-memory bandwidth. For example, cache lines may be as large

as the blocks in main memory, or at least a substantial fraction of this size. When

a cache read miss forces a line to be loaded from main memory, spatial locality

ensures that most if not all of the data items in that line will be accessed before the

line is overwritten by another. And caches can be designed to transfer lines back

to main memory infrequently ( write-back cache ) rather than immediately after

the processor writes a data item to the cache ( write-through cache ), minimizing

the main-memory bandwidth consumed by writing, and thereby maximizing the

main-memory bandwidth available for reading.

From the standpoint of the processor, cache memory addresses both of the key

concerns of main memory: Apparent memory latency is reduced, and apparent

memory bandwidth is increased. If cache memory size could be made arbitrarily

large, both apparent latency and apparent bandwidth could in principle be driven

to the point of diminishing return (i.e., to the point where further improvement

would not increase processor performance). In practice, cache size is limited to a

small fraction of the size of main memory, after which cache performance slows

to that of main memory. Because apparent latency increases quickly even for very

low miss rates, GPU implementations are typically tuned to achieve performance

that is unconstrained by memory bandwidth (assuming typical graphics loading)

with caches that are far too small to ensure the required latency. The (otherwise

unacceptable) apparent memory latency is hidden by multithreading, as described

in Section 38.6.3, rather than by outsized cache memories.

It is still possible for shader programmers to get in trouble, though, by

demanding more memory bandwidth than is available. For example, GPU tex-

ture interpolation performance is typically balanced assuming high data locality.

If this assumption is disrupted—if, for example, texture sample addresses spec-

ify disjoint, widely separated clusters of texels—then an excessively large num-

ber of memory blocks may be transferred from main memory to cache memory,

and shader performance can plummet. Undersampling a texture is one way to

create this situation. Thus, texture aliasing not only destroys image quality, it

can also destroy GPU performance! Dependent texture reads, meaning calls to

tex1D , tex2D ,or tex3D , with a parameter that is not directly derived from the

Computer Graphics: Principles and Practice

Search WWH ::

Custom Search

Home