Modern Graphics Hardware - Computer Graphics: Principles and Practice

Graphics Reference

In-Depth Information

work. Let n be the vector's length, i pred be the number of predicated instruction

steps required to execute the vector to completion, and i seq be the total number of

instruction steps that would be required to execute the element's shaders sequen-

tially, one at a time, as though on a single processor. Then the utilization of this

vector's execution ( u vec ) is the ratio of useful work done ( i seq ) to the number of

slots available for that work ( n

i pred ):

i seq

u vec =

i pred .

(38.3)

In example B, the nondiverging case,

u vec = 6 + 6 + 6 + 6

= 1.0,

(38.4)

which is the maximum possible value, indicating full utilization. In example C,

the diverging case,

u vec = 8 + 6 + 8 + 8

= 0.75,

(38.5)

indicating partial utilization. Predication ensures that an operation is executed for

at least one element during each cycle, so the worst possible utilization for an n -

wide vector core is 1

n . Minimum utilization is achieved in the switch-statement

situation described above; it is approached asymptotically when a single element

executes a path that is much longer than the paths executed by the other elements.

Utilization directly scales performance—the 0.75 utilization achieved in

example C corresponds to 75% of peak performance, or 33% additional running

time (when aggregated across many elements). Because poor utilization is the

direct result of divergence, it is useful to understand the likelihood of divergence,

perhaps as a first step to minimizing it.

Again consider a shader with a single conditional branch. Let p be the branch's

probability of taking the yes path, and 1

p be its probability of taking the no path.

Then, if p is evaluated independently for each element, that is, if evaluations of p

had no locality, then divergence outcome probabilities for an n -wide vector core

are:

−

p n

no divergence, all yes outcomes

p ) n

( 1

−

no divergence, all no outcomes

(38.6)

− p n +( 1

p ) n divergence, various utilizations.

−

Unless p is either very near to zero or very near to one, the probability of diverging

increases rapidly as vector length n increases. For example, the probability of

divergence with p = 0.1 is 34% for n = 4, but it increases to 81% for n = 16,

97% for n = 32, and 99.9% for n = 64. Even with p = 0.01, a seemingly low

probability, divergence occurs almost half the time (47%) for a vector length of

64. These odds might dissuade GPU architects from implementing wide vector

units if they were correct, but in general they are not.

In fact, evaluations of p are not independent—they tend to cluster into yes

groups and no groups. Temporal locality predicts this: Clusters of repeated ref-

erences suggest that the same code branch is executed repeatedly. The geometric

nature of computer graphics often strengthens the effect. Consider the typical case

of a predicate p that is true in shadow and false otherwise. Some triangles will be

Computer Graphics: Principles and Practice

Search WWH ::

Custom Search

Home