Modern Graphics Hardware - Computer Graphics: Principles and Practice

Graphics Reference

In-Depth Information

but constrains optimizations by the GPU driver (e.g., a compromise tiling of the

texture image may be chosen). 14

38.7.2 Cache Memory

We've discussed what data locality is, and seen some examples of how GPU archi-

tecture and implementation are designed to increase data locality. Now we'll see

how data locality can be exploited by system designers to greatly improve the

performance of computing systems.

Recall from Section 38.6.2 that latency and bandwidth, the two greatest con-

cerns of a memory-system designer, are inversely related to memory capacity:

Smaller memories have lower access latency and higher access bandwidth, that

is, they are faster than larger memories. Because locality of reference makes it

likely that, for a given short window of time, most memory accesses are to a small

number of physical memory blocks, we can improve performance by storing these

blocks in a smaller, faster memory. There are two ways to expose this local mem-

ory: explicitly and implicitly.

The explicit approach directly exposes local memory in the architecture, giv-

ing programmers complete control of its use. Address fields of explicit local

memory (which is small) require fewer bits than addresses for the main memory

system (which is large), so their use reduces the bit rate of the processor instruc-

tion stream, as well as benefitting from reduced latency and increased bandwidth

to the specified data. Registers are an extreme form of explicit local memory: They

require very small addresses, and they exhibit negligible latency and huge band-

width. The local memory blocks in Figure 38.3 are a more typical form of explicit

local memory. Because these memories are not visible in either the OpenGL or the

Direct3D pipeline models, we defer discussion to Section 38.9.

Although registers are explicit in the Direct3D architecture, their allocation

and use is managed by the shader compiler rather than by programmers. In prin-

ciple, shader compilers could also manage and optimize the use of other local

memory, but in practice this is left to human programmers. Programmer-specified

management of local memory is powerful, but it is also complex, time-consuming,

and error-prone. Thus, it is desirable to provide a form of local storage that is man-

aged implicitly and automatically by low-level (typically hardware-implemented)

mechanisms within the architecture. We refer to such local memory as cache (pro-

nounced “cash”) memory.

Cache memory intercepts all accesses to main memory. If the requested data

item is already present in the cache, it is either returned with low latency (in the

case of a read request) or modified in place (in the case of a write request). If

the requested item is not present in the cache, some previously cached item is

evicted, the requested item is read from main memory into the cache, and then

either returned with high latency (in the case of a read request) or modified in place

(in the case of a write request). All of this happens implicitly, without programmer

intervention, so there is no opportunity for programmer error. But programmers

can significantly influence performance by coding to maximize data locality, since

accesses of main memory are so costly.

14. OpenGL versions 3.3 and later have adopted the Direct3D approach of decoupling

texture images and interpolation modes.

Computer Graphics: Principles and Practice

Search WWH ::

Custom Search

Home