Overview
This topic is about constructive geometry. Our object is to study geometry, all sorts of geometry, and also to present a setting in which to carry out this investigation on a computer. The adjective "constructive" is intended to imply that we will not be satisfied with an answer to a geometric problem unless we also are given a well-defined algorithm for computing it. We need this if we are going to use a computer. However, even though our goal is a computational one, our journey will take us through some beautiful areas of pure mathematics that will provide us with elegant solutions to problems and give us new insights into the geometry of the world around us. A reader who joins us in this journey and stays with us to the end will either have implemented a sophisticated geometric modeling program in the process or at least will know how to implement one.
Figure 1.1 shows the task before us at a very top level. We have a number of representation problems. Our starting point is the "rear’ world and its geometry, but the only way to get our hands on that is to build a mathematical model. The standard approach is to represent "real" objects as subsets of Euclidean space. Since higher dimensional objects are also interesting, we shall not restrict ourselves to subsets of 3-space. On the other hand, we are not interested in studying all possible subsets. In this topic, we concentrate on the class of objects called finite polyhedra. More exotic spaces such as fractals (the spaces one encounters when one is dealing with certain natural phenomena) will only be covered briefly. They certainly are part of what we are calling geometric modeling, but including them would involve a large amount of mathematics of a quite different flavor from that on which we wish to concentrate here. Next, after representing the world mathematically, we need to turn the (continuous) mathematical representations into finite or discrete representations to make them accessible to computers. In fact, one usually also wants to display objects on a monitor or printer, so there are further steps that involve some other implementation and computation issues before we are done.
Figure 1.1. The geometric modeling representation pipeline.
As we look at the various representation problems shown in Figure 1.1, note that, although we have only mentioned objects so far, representations also need to represent the maps (operations, etc.) between them because a good and complete model of something needs to mimic everything in the original. In any case, objects and maps go hand in hand in mathematics. With every new class of objects it is fruitful to define the naturally associated maps (take vector spaces and linear transformations, for example).
To summarize, the emphasis of this topic is on showing how to model finite poly-hedra and the invariants associated to them on a computer and we shall show how to set up a programming environment to facilitate this investigation. One has a fairly good grip on the mathematics part of the representation pipeline, but less so on the rest, at least in terms of having a well-defined theoretical approach. The fact is that, although computer graphics is an exciting, rapidly developing field that has come a long way from the early days when people first tried to use computers for this, things are still being done in rather ad hoc ways. There is really no overall systematic approach, only a lot of isolated, special results that, neat as some of the ideas and algorithms may be, do not fit into any unifying picture. To put it another way, computer graphics today is an “art” and not a “science.” There have been a few attempts to formalize the digital geometry aspect. See [Fium89] or [Herm98], for example. On the other hand, since the nonmathematical part of computer graphics depends on the current technology used for the display medium (raster devices at present) and, of course, the computer, and since this will continually evolve (with holographic displays the next dominant medium perhaps), the hardcore part of “computer” graphics may stay an art and never become a science.
All that we shall do in this topic is get a few preliminaries out of the way. We shall introduce some basic terminology and indicate some of the mathematics we shall need. What little we have to say about hardware topics will be found in this topic.
The Basic Graphics Pipeline
Any meaningful use of a computer to study geometry implies that we ultimately want to display objects on a graphics device. Figure 1.2 shows some standard terminology for the first step of the three-dimensional (3d) graphics pipeline that takes us from the mathematical representation of an object in R3 to its image on the device. Objects in the world are described by the user with respect to a world coordinate system. The world is then projected onto a view plane from some viewpoint that we shall think of as the location of a camera or the eye. We have an associated view plane and camera coordinate system.
Figure 1.2. 3d graphics coordinate systems and terminology.
Looking from the viewpoint along the positive z-axis of the camera coordinate system specifies the view direction. A window in the view plane specifies the area of interest. The view volume or view pyramid is the infinite volume swept out by the rays starting at the viewpoint and passing through points of the window. To limit the output of objects one often uses a near (or front or hither) and far (or back or yon) clipping plane. The volume inside the view volume between these two planes is called the truncated view volume or truncated view pyramid. Only those parts of objects that lie in this volume and project into the window will be displayed. Finding those parts of an object is referred to as clipping. In principle, the three coordinate systems – the world, the camera, and the view plane coordinate system – could be distinct. In practice, however, one assumes that the coordinate axes of the camera and view plane coordinate system are parallel and the z-axes are perpendicular to the view plane. One also assumes that their x- and y-axes are parallel to the sides of the window.
The final step in mapping an object to a graphics device involves a map that transforms view plane coordinates to physical device coordinates. This is usually thought of as a two-stage process. First, an initial map transforms the window to a viewport that is a subrectangle of a fixed rectangle called the logical screen, and a second map then transforms logical screen coordinates to physical device coordinates. See Figure 1.3. Sometimes the logical screen is already defined in terms of these coordinates, so that the second map is not needed. Other times, it is set equal to a standard fixed rectangle such as the unit square [0,1] x [0,1], in which case we say that the viewport is specified in normalized device coordinates (NDC).
Figure 1.3. The window-to-device pipeline.
Figure 1.4. The basic 3d graphics pipeline.
The two-dimensional graphics pipeline is similar but much simpler. The window-to-device pipeline shown in Figure 1.3 stays the same, but Figures 1.2 and 1.4 get replaced by Figures 1.5 and 1.6, respectively. We have a two-dimensional world coordinate system and a window whose edges are parallel to the coordinate axes. In the case of the three-dimensional graphics pipeline, one usually assumes that the window is of a fixed size centered on the z-axis of the camera coordinate system. This is adequate to achieve most views of the world. To move the viewpoint and change the view direction we simply change the camera coordinate system. Zooming in and out is accomplished by moving the view plane further from or closer to the viewpoint. In the two-dimensional graphics case, on the other hand, one must allow the window to move and change in size. We have to be able to move the window to see different parts of the two-dimensional world and we must be able to shrink or expand the size of the window to zoom.
One word of caution is in order. The distinction between “window” and “viewport” is often blurred and, sometimes, what should be called a viewport is called a window. The terms used are not as important as the conceptual difference. One specifies what one sees in user coordinates and the other specifies where one sees it. The window, as defined above, refers to the former and the viewport, to the latter.
Figure 1.5. 2d graphics coordinate system and window.
Figure 1.6. The basic 2d graphics pipeline.
Hardware Basics
Although the goal of this topic is to emphasize the abstract ideas in graphics, one does need to understand a few hardware basics because that is what drives the search for efficient algorithms for the implementation of low-level graphics primitives. The most common display devices have been cathode ray tube (CRT) devices. Here an electron beam traces out an image on a phosphor-coated screen. There have been different types of CRTs, but since the early 1970s raster scan CRTs have been the most prevalent graphics display devices. They are refresh CRTs because the electron beam is continually rescanning the entire screen. The screen itself should be thought of as a rectangular array of dots. The image that one sees depends on how those dots are lit. The beam starts at the top of the screen and proceeds down the screen from one scan line to the next until it gets to the bottom. It then jumps back to the top. See Figure 1.7. The term “horizontal retrace” refers to the time the beam jumps from the end of a line to the beginning of the next line and “vertical retrace” refers to the time it jumps from the right bottom corner of the screen to the top left corner. These times, especially the latter, were often used to write to the screen to avoid flicker and knowing them was important to game developers who wanted to produce smooth animation effects.
Another display technology that has been becoming more and more popular in recent years is the liquid crystal display (LCD). Although there are different variants, LCDs are also raster scan devices because, for all practical purposes, they consist of a rectangular array of dots that is refreshed one row at a time. The dots themselves are the “liquid crystals,” which are usually organic compounds that consist of molecules that allow light to pass through them if they are aligned properly by means of an applied voltage. The bottom line is that the liquid crystals can be individually switched on or off. LCDs have a number of advantages over the raster scan CRTs. In particular, one does not have to worry about refresh rates or flicker and they are not as bulky.
Figure 1.7. The raster scan CRT.
The hardware assumption made in this topic, one that should apply to two-dimensional displays in the foreseeable future, is that the reader is working on a raster scan device. This assumption has an important consequence. Raster scan devices use a refresh buffer to specify which dots on the screen are to be lit and how. To get the picture we want, we only have to set the values in that buffer correctly. Therefore, our abstract representation problem specializes to representing subsets of Euclidean space as (discrete) subsets of a rectangle in Z2. Less formally, we shall talk about representing objects in a “raster.” A raster refers to a two-dimensional rectangular array of pixels, where a pixel is an abbreviation for “picture element,” which could, in theory, be any value. In practice, a pixel is represented in computer memory by one or more bits that specify a color. A monochrome picture is where each pixel is represented by only one bit. A row in a raster is called a scan line. If the raster has m columns and n rows, then we say that the resolution of the picture is m x n.
The hardware graphics standards for computers have evolved over time. The standards for the IBM personal computer (PC) are listed in chronological order below:
Type |
Resolution |
Number of colors |
CGA |
640 x 200 |
2 (black plus one other) |
Hercules |
720 x 348 |
2 (black and white) |
EGA |
640 x 350 |
16 |
VGA |
640 x 480 |
16 |
super VGA |
For more details about these standards see [Wilt87] or [Ferr94].
The refresh buffer of a raster scan device is usually called a frame buffer. In general, the term “frame buffer” refers to an array of memory (separate from main memory) thought of as a two-dimensional array of pixels (a raster). Frame buffers serve two functions:
(1) as a place where the image is stored as it is computed
(2) as a refresh buffer from which the image is displayed on a raster device
A frame buffer is an interface between what are usually relatively slow graphics computations and the high data rate video image display. In the typical personal computer the frame buffer is located on the graphics card that manages the video subsystem of the computer. It basically used to be not much more than some extra memory. For example, the table below describes the frame buffers used by the IBM PC family of computers:
Type |
Size of frame buffer |
Starting memory address (in hexadecimal) |
CGA |
16 K |
B800:0000 |
Hercules |
64 K |
B000:0000 |
EGA,VGA |
256K for 16 colors |
accessed via a 64K window starting at A000:0000 |
super VGA |
1 M or more |
accessed via a 64K window starting at A000:0000 |
Over time the graphics subsystems of personal computers have become more powerful, and the hardware is supporting more and more of the operations that one needs to perform in graphics, such as antialiasing (Section 2.6) and the bit map operations discussed below and in Section 2.10.
As indicated above, displaying objects on the computer screen involves writing to the frame buffer. This amounts to storing values in memory. Ordinarily, a store operation replaces the value that was there. In the case of frame buffers one has more options. If A is a location in memory, then let [A] denote the content of A. Frame buffers typically support store operations of the form (V op [A]) ^ [A], where V is a new value and op is a binary logical operator that operates on a bit-by-bit basis. Typical binary logical operations on bits are or, and, xor, and replace. The statement (V replace [A]) ^ [A] corresponds to the standard store operation where the new value replaces the old one. When a frame buffer uses a store operation corresponding to an operator op, we shall say that it is in op mode. For example, we may talk about being in xor mode.
As a simple example of how having various modes for a frame buffer can be useful, consider how the standard quick and dirty method used to move a cursor around on the screen without destroying the background uses the xor mode. The method relies on xor’s well-known property that
What this means is that if one xor’s the same value to a memory location twice in a row, then that memory location will hold its original value at the end. Now, a straightforward way to move a cursor on the screen without erasing what is there would be to save the area first before writing the cursor to it and then restoring the old value after the cursor has moved. This would be very time consuming. There is a much better way of using the xor mode. Assume that the cursor starts out at some initial position defined by a variable oldA. Now switch into xor mode and repeat the following three steps as often as desired:
Draw cursor at oldA (this will erase the cursor)
Draw cursor at new position newA Replace the value in oldA with that in newA
Note that replace mode would cause this loop to erase everything in the cursor’s path and leave behind a trail of the cursor. There is one disadvantage with the xor operation, however, which may not make it a viable option in certain situations. Although one can use it to move objects around on the screen without destroying the background, the objects may change color. If, for example, one wants to move a red cursor and have it stay red, then this is not possible with xor mode because the cursor will assume different colors as it moves over differently colored areas of the screen. Therefore, if it is important that the cursor stay red, then there is no simple alternative to first saving the area to which one is writing and restoring it afterwards.
Because the availability of logical operators in store operations simplifies and speeds up many useful graphics operations, current graphics systems have built-in hardware support for them. We will have more to say about this in Section 2.10.
We finish this section with two more terms one sees frequently in graphics. Scan conversion is the act of converting points, lines, other geometric figures, functions, etc., into the raster data structure used in frame buffers one scan line at a time. After a scene is modeled, it needs to be “rendered.” To render a scene means to construct an image on a display device that is visually satisfactory. What is “satisfactory” depends firstly on the device and its constraints and secondly on what one is trying to do. To emphasize the position that rendering occupies in graphics, keep in mind that the modeling or mathematical representation comes first and then the rendering. Any given model can have many different renderings. For example, a sphere can be rendered in different colors. In trying to render scenes one runs into a number of important problems: visible line or surface determination, illumination, texturing, transparency, etc. These will all be addressed in coming topics.