Transformations and the Graphics Pipeline(Basic Computer Graphics) Part 4

Parallel Projections

So far we have dealt solely with perspective views, but there are times when one wants views based on parallel projections. Although this can be thought of as a special case of central projections where the camera is moved to “infinity” along some direction, it is worth considering on its own because one can achieve some simplifications in that case.

Assume that our view plane is the x-y plane and that we are interested in the parallel projection of the world onto that plane using a family of parallel lines. See Figure 4.16.

Proposition. If p is the parallel projection of R3 onto R2 with respect to a family of parallel lines with direction vectortmpc646633_thumb[2][2][2], then

tmpc646635_thumb[2][2][2]


 

 

 

A parallel projection onto the x-y plane.

Figure 4.16. A parallel projection onto the x-y plane.


Proof. Exercise 4.9.1.

Passing to homogeneous coordinates, consider the projective transformationtmpc646637_thumb[2][2][2] defined by the matrix

tmpc646639_thumb[2][2][2]

Our parallel projection onto the x-y plane is then nothing but the Cartesian version of Tpar followed by the orthogonal projection (x,y,z) ^ (x,y,0). It follows that the matrix Mpar plays the role of the matrix Mpe^p in Section 4.5 (equation (4.9)) in that it reduces a general projection problem into a simple orthogonal projection.

Notice that a parallel projection does not depend on the length of the vector v. In fact, any multiple of v will define the same projection, as is easily seen from its equations. The parallel projection can also be considered the limiting case of a central projection where one places an eye at a position v = (v1,v2,v3) = (a’d,b’d,-d) and one lets d go to infinity. This moves the eye off to infinity along a line through the origin with direction vector v. The larger d gets, the more parallel are the rays from the eye to the points of an object. The matrix Meye in equation (4.11) (with a = a’d and b = b’d) approaches Mpar because 1/d goes to zero.

An even simpler case occurs when the vector v is orthogonal to the view plane.

Definition. A parallel projection where the lines we are projecting along are orthogonal to the view plane is called an orthographic (or orthogonal) projection. If the lines have a direction vector that is not orthogonal to the view plane, we call it an oblique (parallel) projection. A view of the world obtained via an orthographic or oblique projection is called an orthographic or oblique view, respectively.

A single projection of an object is obviously not enough to describe its shape.

Definition. An axonometric projection consists of a set of parallel projections that shows at least three adjacent faces. A view of the world obtained via an axonometric projection is called an axonometric view.

Perspective and orthographic views of a 2 x 5 x 3 block.

Figure 4.17. Perspective and orthographic views of a 2 x 5 x 3 block.

In engineering drawings one often shows a perspective view along with three orthographic views – a top, front, and side view, corresponding to looking along the z-, y-, and x-axis, respectively. See Figure 4.17. For a more detailed taxonomy of projections see [RogA90].

Finally, in a three-dimensional graphics program one might want to do some 2d graphics. For example, one might want to let a user define curves in the plane. Rather than maintaining a separate 2d structure for these planar objects it would be more convenient to think of them as 3d objects. Using the orthographic projection, one can simulate a 2d world for the user.

Homogeneous Coordinates: Pro and Con

The computer graphics pipeline as we have described it made use of homogeneous coordinates when it came to clipping. The given reason for this was that it avoids a division by zero problem. How about using homogeneous coordinates and matrices everywhere? This section looks at some issues related to this question. We shall see that both mathematical and practical considerations come into play.

Disadvantages of the Homogeneous Coordinate Representation. The main disadvantage has to do with efficiency. First, it takes more space to store 4-tuples and 4 x 4 matrices than 3-tuples and 3 x 4 matrices (frames). Second, 4 x 4 matrices need more multiplications and additions to act on a point than 3 x 4 matrices. Another disadvantage is that homogenous coordinates are less easy to understand than Cartesian coordinates.

Advantages of the Homogeneous Coordinate Representation. In a word, the advantage is uniformity. The composite of transformations can be dealt with in a more uniform way (we simply do matrix multiplication) and certain shape manipulations become easier using a homogeneous matrix for the shape-to-world coordinate system transformation.

Parts of a homogeneous matrix.

Figure 4.18. Parts of a homogeneous matrix.

Transformation examples.

Figure 4.19. Transformation examples.

Furthermore, computer hardware can be optimized to deal with 4 x 4 matrices to more than compensate for the inefficiency of computation issue mentioned above.

Let us look at the advantage of homogeneous coordinates in more detail. To see the geometric power contained in a 4 x 4 homogeneous matrix consider Figure 4.18. The matrix can be divided into the four parts L, T, P, and S as shown, each of which by itself has a simple geometric interpretation. The matrix corresponds to an affine map if and only if P is zero and in that case we have a linear transformation defined by L followed by a translation defined by T. If P is nonzero, then some plane will be mapped to infinity. We illustrate this with the examples shown in Figure 4.19.

First, consider L. That matrix corresponds to a linear transformation of R3. If L is a pure diagonal matrix, then we have a map that expands and/or contracts along the x, y, or, z axis. For example, the map in Figure 4.19(a) sends the point (x,y,z) to the point (x,7y,z), which expands everything by a factor of 7 in the y direction.

A lower triangular matrix causes what is called a shear. What this means is that the map corresponds to sliding the world along a line while expanding or contracting in a possibly not constant manner along a family of lines not parallel to the first line. The same thing holds for upper triangular matrices. For example, consider the matrix M in Figure 4.19(b). The point (x,y,z) gets sent to (x + 3y,y,z). Points get moved horizontally. The bigger the y-coordinate is, the more the point is moved. Note that this map is really an extension of a map of the plane.

Next, consider the map in Figure 4.19(c). This map sends the point (x,y,z) to (x -1,y + 3,z + 5) and is just a simple translation by the vector (-1,3,5). The map in Figure 4.19(d) sends the homogenous point (x,y,z,1) to (x,7y,z,5), in other words, (x,y,z) is sent to (x/5,7y/5,z/5), which is just a global scaling. Finally, the map in Figure 4.19(e) sends (x,y,z) to (x/(2x + 3y + 1), y/(2x + 3y + 1), z/(2x + 3y + 1)). The plane 2x + 3y + 1 = 0 gets sent to infinity. The map is a two-point perspective map with vanishing points for lines parallel to the x- or y-axes.

We finish this section by describing a way to visualize homogeneous coordinates and why some caution should be exercised when using them.

The standard embedding of R3 in P3 maps (x,y,z) to [x,y,z,1]. This means that we can use the space of 4-tuples, that is, R4, to help us visualize P3. More precisely, since the lines through the origin correspond in a one-to-one fashion with the points of P3, we can use the plane w = 1 in R4 to represent the real points of P3. Furthermore, if someone talks about a point p1 with homogeneous coordinates (x,y,z,w), then we can pretty much deal with p1 as if it actually were that 4-tuple in R4. We need to remember, however, that if p1 lies on a line through the origin and a point A on the plane w = 1, then p1 and A will represent the same point of P3. See Figure 4.20. Now, once one decides to use homogeneous coordinates for a graphics problem, although one usually starts out with a representative like A, after one has applied several transformations (represented by 4 x 4 matrices), one may not assume that the 4-tuple one ends up with will again lie on the plane w = 1. Although one could continually project back down to the w = 1 plane, that would be awkward. It is simpler to let our new points float around in R4 and only worry about projecting back to the “real” world at the end. There will be no problems as long as we deal with individual points. Problems can arise though as soon as we deal with nondiscrete sets.

The w = 1 plane in R4.

Figure 4.20. The w = 1 plane in R4.

In affine geometry, segments, for example, are completely determined by their endpoints and one can maintain complete information about a segment simply by keeping track of its endpoints. More generally, in affine geometry, the boundary of objects usually determines a well-defined “inside,” and once we know what has happened to the boundary we know what happened to its “inside.” A circle in the plane divides the plane into two parts, the “inside,” which is the bounded part, and the “outside,” which is the unbounded part. This is not the case in projective geometry, where it is not always clear what is “inside” or “outside” of a set. Analogies with the circle and sphere make this point clearer. Two points on a circle divide the circle into two curvilinear segments. Which is the “inside” of the two points? A circle divides a sphere into two curvilinear disks. Which is the “interior” of the circle?

Here is how one can get into trouble when one uses homogeneous coordinates with segments. Again, consider Figure 4.20 and the segment corresponding to the “real” points A and B. The figure shows that at least with some choices of representatives, namely, p1 and p2, nothing strange happens. The segment [p1,p2] in R4 projects onto the segment [A,B] and so the points

tmpc646644_thumb[2][2][2]

represent the same points of P3 as the points of [A,B]. It would appear as if one can deal with segments in projective space by simply using the ordinary 4-tuple segments in R4. But what if we usedtmpc646645_thumb[2][2][2]instead,    wheretmpc646646_thumb[2][2][2]See    Figure    4.21.    In    that case, the segmenttmpc646647_thumb[2][2][2]projects    to the exterior segment on A and B and so determines different points in P3 from [A,B]. The only way to avoid this problem would be to ensure that the w-coordinate of all the points of our objects always stayed positive as they got mapped around. Unfortunately, this is not always feasible.

Problems with homogeneous representatives for points.

Figure 4.21. Problems with homogeneous representatives for points.

The Projections in OpenGL

In OpenGL one needs to specify the viewing volume. This is done in the way indicated in Figure 4.22. Note that there is no view plane as such and that the z-axis does not point in the direction one is looking but in the opposite direction. The view volume is specified by the far clipping plane z = -f and a rectangletmpc646652_thumb[2][2][2]in the near clip ping plane z = -n.

The steps in OpenGL are

(1)    Convert to camera coordinates.

(2)    Map the homogeneous camera coordinates to homogeneous clip space, but this time one maps into the cube [-1,1] x [-1,1] x [-1,1]. The homogeneous matrix M that does this is defined by the equation

tmpc646654_thumb[2][2][2]

The matrix M is obtained in the same manner as wastmpc646655_thumb[2][2][2]in    Section 4.5. The call to the function glFrustumtmpc646656_thumb[2][2][2]in    OpenGL    generates    the    matrix nM.

(3)    Project to normalized device coordinates in Euclidean space (division by w).

(4)    Transform to the viewport.

The OpenGL viewing volume.

Figure 4.22. The OpenGL viewing volume.

Reconstruction

Ignoring clipping, which we shall in this section, by using homogeneous coordinates the mathematics in our discussion of the graphics pipeline basically reduced to an equation of the form

tmpc646660_thumb[2][2][2]

where M was a 4 x 3 matrix,tmpc646661_thumb[2][2][2]The given quantities were the matrix M, computed from the given camera, and a point in the world that determined a. We then used equation (4.13) to compute b and the corresponding point in the view plane. Our goal here is to give a brief look at basic aspects of two types of inverse problems. For additional details see [PenP86]. For a much more thorough and mathematical discussion of this section’s topic see [FauL01].

The Object Reconstruction Problem. Can one determine the point in the world knowing one or more points in the view plane to which it projected with respect to a given camera or cameras?

The Camera Calibration Problem. Can one determine the world-to-view-plane transformation if we know some world points and where they get mapped in the view plane?

Engineers have long used two-dimensional drawings of orthogonal projections of three-dimensional objects to describe these objects. The human brain is quite adept at doing this but the mathematics behind this or the more general problem of reconstructing objects from two-dimensional projections using arbitrary projective transformations is not at all easy. Lots of work has been done to come up with efficient solutions, even in what might seem like the simpler case of orthographic views. See, for example, [ShiS98]. Given three orthographic views of a point (x,y,z), say a front, side, and top view, one would get six constraints on the three values x, y, and z. Such overconstrained systems, where the values themselves might not be totally accurate in practice, are typical in reconstruction problems and the best that one can hope for is a best approximation to the answer.

Before describing solutions to our two reconstruction problems, we need to address a complication related to homogeneous coordinates. If we consider projective space as equivalence classes [x] of real tuples x, then mathematically we are really dealing with a map

tmpc646663_thumb[2][2][2]

Equation (4.13) had simply replaced equation (4.14) with an equation of representatives a, M, and b for p, T, and q, respectively. The representatives are only unique up to scalar multiple. If we are given p and T and want to determine q, then we are free to choose any representatives for p and T. The problems in this section, however, involve solving for p given T and b or solving for T given p and q. In these cases, we cannot assume that the representatives in equation (4.13) all have a certain form. We must allow for the scalar multiple in the choice of representatives at least to some degree. Fortunately, however, the equations can be rewritten in a more convenient form that eliminates any explicit reference to such scalar multiples. It turns out that we can always concentrate the scalar multiple in the “b” vector of equation (4.13). Therefore, rather than choosing the usual representative of the form b = (b1,b2,1) for [b], we can allow for scalar multiples by expressing the representative in the form

tmpc646664_thumb[2][2][2]

Lettmpc646665_thumb[2][2][2],    be    the    column vectors of M. Equation (4.13) now becomes

tmpc646667_thumb[2][2][2]

It follows thattmpc646668_thumb[2][2][2]Substituting    for    c    and    moving    everything    to    the    left,    equation (4.13) can be replaced by the equations

tmpc646670_thumb[2][2][2]

It is this form of equation (4.13) that will be used in computations below.    They    have a scalar multiple for b built into them.

After these preliminaries, we proceed to a solution for the first of our two reconstruction problems. The object reconstruction problem is basically a question of whether equation (4.13) determines a if M and b are known. Obviously, a single point b is not enough because that would only determine a ray from the camera and provide no depth information. If we assume that we know the    projection    of    a    point    with respect to two cameras, then we shall get two equations

tmpc646671_thumb[2][2][2]

At this point we run into the scalar multiple problem for homogeneous coordinates discussed above. In the present case we may assume thattmpc646672_thumb[2][2][2] are two fixed predetermined representatives for our projections and that we are looking for a normalized tupletmpc646673_thumb[2][2][2]as long as we allow a scalar multiple

ambiguity intmpc646674_thumb[2][2][2]Expressing    equations (4.17) in the form (4.16) leads, after some rewriting, to the matrix equation

tmpc646678_thumb[2][2][2]

where

tmpc646679_thumb[2][2][2]

and

tmpc646680_thumb[2][2][2]

This gives four equations in three unknowns. Such an overdetermined system does not have a solution in general; however, if we can ensure that the matrix A has rank three, then there is a least squares approximation solution

tmpc646681_thumb[2][2][2]

using the generalized matrix inverse A+ (see Theorem 1.11.6 in [AgoM05]).

Next, consider the camera calibration problem. Mathematically, the problem is to compute M if equation (4.13) holds for known points aj andtmpc646682_thumb[2][2][2]This time around, we cannot normalize the ai and shall assume thattmpc646683_thumb[2][2][2]and

tmpc646684_thumb[2][2][2]It is convenient to rewrite equations (4.16) in the form

tmpc646688_thumb[2][2][2]

We leave it as an exercise to show that equations (4.20) can be written in matrix form as

tmpc646689_thumb[2][2][2]

where

tmpc646690_thumb[2][2][2]

and

tmpc646691_thumb[2][2][2]

This overdetermined homogeneous system in twelve unknowns mij will again have a least squares approximation solution that can be found with the aid of a generalized inverse provided that n is not zero.

Next post:

Previous post: