Graphics Reference
In-Depth Information
the factor W 1 is known as well. Only W is unknown. Thus, the summation above
is a huge quadratic in the unknown positions. If we place all of these unknown
positions into a 3 n
×
1 vector x , then the minimization can be rewritten in the
form
2 ,
w 1 , ... , w n
min
c
Ax
(25.39)
where A is a large, sparse matrix. In fact, if we place the x -coordinates of all the
unknowns in the first n entries of x , and then the y -components in the next n , and
then the z -components in the third n entries, the matrix A ends up block-diagonal,
consisting of three n
n blocks.
To minimize such a quadratic expression, we set the gradient to zero and solve;
the result is the system of linear equations
×
A T Ax = A T c .
(25.40)
The matrix A depends only on the known data, so it needs to be computed
only once, as does Q = A T A . And solving a problem of the form
Qx = A T c
(25.41)
can be done with the LU decomposition of Q (which we need only compute once)
and back-substitution. The LU decomposition can be found blockwise for the
three blocks, further simplifying the computation.
The trickiest part of implementing this algorithm concerns bookkeeping:
Transforming from an optimization in the form of Equation 25.35 to an optimiza-
tion in the form of Equation 25.37 involves careful index manipulation. If you
actually want to implement this idea, we strongly suggest that you first do so in
two dimensions, and that you work with an example mesh consisting of no more
than about five vertices and seven edges (which play the role of triangles in the
two-dimensional formulation). It will also help to formulate the computations in
a language like Matlab or Octave, in which matrix formulations are built into the
language. Once you have made that work, transferring to some other language is
much easier, especially since you can use the matrix-formulated implementation
as a reference during debugging.
25.6.2 Triangle Reordering for Hardware Efficiency
As you know, the graphics pipeline, as implemented in the GPU, has several
stages. Any one of these can be a bottleneck. For some models, transforming
vertices dominates, and modern GPUs tend to cache such transformed vertices
because of this. When it comes to generating triangles to be drawn, we get bet-
ter cache use if those that share vertices are processed at nearly the same time.
Triangle strips are one way to produce this kind of mesh locality. For other mod-
els, there may be enormous complexity that is mostly hidden (e.g., a model of
an office building may contain millions of triangles, of which only a few hun-
dred are visible from any particular office space). In these cases, if we can draw
the visible triangles first, then a z -test will tell us that the fragments generated by
other triangles are not visible, and thus that the lighting and shading computations
for those fragments need not be performed. Of course, backface culling also will
help reduce the rendering load by about 50% on average. By clever clustering of
 
 
Search WWH ::




Custom Search