3D and Spatiotemporal Interpolation in Object and Surface Formation (Computer Vision) Part 1

Introduction

This observation applies not only to conceptual differences in the kinds of questions researchers must ask, but also to different levels of visual processing. Vision researchers have made great progress in understanding early cortical filtering. At the opposite end, research has revealed some areas in which high-level representations reside, such as those for objects or faces. Between these levels, however, there is a considerable gap. This gap in “middle vision” involves all of Marr’s levels: the understanding of information for computing representations of contours, surfaces, and objects; the representations and processes involved; and the sites and roles of cortical areas. Although Marr emphasized that these levels have substantial independence in terms of the questions they pose, the lag in understanding what goes on “in the middle” is also related to interactions among these levels. Understanding the task and information paves the way for process descriptions. Similarly, detailed hypotheses about processes and representations guide meaningful neurophysiological investigations.

Fundamental to the middle game in vision are three-dimensional (3D) representations. What is the shape of a surface? How do we represent the shapes of 3D objects and obtain these representations from incomplete and fragmentary projections of an object to the eyes? How do we obtain descriptions of objects and surfaces in ordinary environments, where the views of most objects are partly obstructed by other objects, and visible areas change in complex ways as objects and observers move?

Although many traditional approaches to vision have sought to discover how meaningful perceptual representations can be gotten by inferences from static, two-dimensional (2D) images, it has become increasingly clear that human vision both utilizes complex 3D and spatiotemporal information as inputs and constructs 3D surface representations as outputs. Although human vision may exploit shortcuts for some tasks, 3D surface representations play many important roles both in our comprehension of the world and our ability to interact with it.

In this topic, we consider several lines of research aimed at improving our understanding of 3D and spatiotemporal surface and object formation. Specifically, we are concerned with the achievement of surface or object representations when the visual system must interpolate across spatial and spatiotemporal gaps in the input. The human visual system possesses remarkable mechanisms for recovering coherent objects and surface representations from fragmentary input. Specifically, object and surface perception depend on interpolation processes that overcome gaps in contours and surfaces in 2D, 3D, and spatiotemporal displays. Recent research suggests that the mechanisms for doing so are deeply related in that they exploit common geometric regularities.

SOmE PHENOMeNA of VISuAL INTeRPOLATION

In ordinary perception, partial occlusion of objects and surfaces is pervasive. Panels (a)-(d) on the left side of Figure 10.1, for example, show views of a house occluded by a fence. Even in a single static view, we are able to get some representation of the scene behind the fence. If the several views were seen in sequence by a walking observer, we would get a remarkably complete representation, as suggested by Figure 10.1c.

Perceiving whole objects and continuous surfaces requires perceptual processes that connect visible regions across gaps in the input to achieve accurate representations of unity and shape. These have most often been studied for static 2D representations. Yet perception grapples with a 3D world and produces, in part, truly 3D representations of object contours and surfaces. Furthermore, when objects or observers move, the visible regions of objects change over time, complicating the requirements of object formation. The system deals with fragmentation, not only in space, but across time as well. Thus, we may think of contour and surface perception in the real world as a mapping from information arrayed across four dimensions (three spatial dimensions and time) into 3D spatial representations. If motion is represented, visual object and surface formation is a mapping from fragmented four-dimensional (4D) inputs into coherent, functionally meaningful, 4D representations.

These phenomena are formally similar in that the same physically specified contours of the central figure are given in Figure 10a, 10b, and 10c, and the completed object in each case is defined by the same collection of physically specified and interpolated contours. (Figure 10d includes only the corresponding interpolations in the middle part of the figure.)

Categories of Interpolation Phenomena

A number of phenomena involve connecting visible contours and surfaces across gaps (Figure 10.2). Figure 10.2a shows partial occlusion. Six noncontiguous blue regions appear; yet, your visual system connects them into a single object extending behind the black occluder. The object’s overall shape is apparent. Perceptual organization of this scene also leads to the perception of circular apertures in the black surface, through which the blue object and a more distant white surface are seen. Figure 10.2b illustrates the related phenomenon of illusory contours or illusory objects. Here, the visual system connects contours across gaps to create the central white figure that appears in front of other surfaces in the array. Figure 10.2c shows a transparency version of an illusory figure; the figure is created but one can also see through it. Finally, in Figure 10.2d, a uniform black region is seen to split into two visible figures, a phenomenon that has been called self-splitting objects.

FIGURE 10.1 Real-world interpolation requires integration over time and space. Frames (a,b,c,d): Several images of an occluded real-world scene. The porch of this house is visible between the fence posts and is perceived as a series of connected visual units despite the fact that shape information is fragmented in the retinal projection. Frame (e): When motion and three-dimensional contour and surface interpolation operate, the visual system can generate a far more complete representation of the scene, of the sort depicted here.

FIGURE 10.2 Four perceptual phenomena that can be explained by the same contour interpolation process. (a) A partially occluded object. The blue fragments are spatially disconnected, but we perceive them as part of the same object. (b) The same shape appears as an illusory figure and is defined by six circles with regions removed. (c) A bistable figure that can appear either as a transparent blue surface in front of six circles or an opaque blue surface seen through six circular windows. (d) A self-splitting object. The homogenous black region is divided into two shapes. This figure is bistable because the two shapes appear to reverse depth ordering over time.

These phenomena are formally similar in that the same physically specified contours of the central figure are given in each case, and the completed object in each case is defined by the same collection of physically specified and interpolated contours.

Contour and Surface Processes

Evidence suggests two kinds of mechanisms for connecting visible areas across gaps: contour and surface interpolation. These processes can be distinguished because they operate in different circumstances and depend on different variables (See Figure 10.3). Contour interpolation depends on geometric relations of visible contour segments that lead into contour junctions. Surface interpolation in 2D displays can occur in the absence of contour segments or junctions; it depends on the similarity of lightness, color, or texture of visible surface patches.

Figure 10.3 Contour and surface interpolation. (a) The three black regions appear as one object behind the gray occluder. Both contour and surface interpolation processes are engaged by this display. (b) Contour interpolation alone. By changing the surface colors of visible regions, surface interpolation is blocked. However, the relations of contours still engage contour interpolation, leading to some perceived unity of the object. (c) Surface interpolation alone. By disrupting contour relatability, contour interpolation is blocked. Due to surface interpolation, there is still some impression that the three fragments connect behind the occluder. (d) With both contour and surface interpolation disrupted, blue, yellow, and black regions appear as three separate objects.

Figure 10.4 Illustration of twodimensional surface interpolation. The circular areas in the display do not trigger contour processes, due to the absence of tangent discontinuities. Surface interpolation causes some circular areas to appear as holes in the occluder rather than as spots in front. Two dots in (a) are changed in color in (b), causing a difference in their appearance (e.g., the yellow spot in (a) when turned white becomes a hole due to its relation with the color of the surround). Relations of contour and surface interpolation are shown by blue spots appearing as holes if they fall within interpolated (or extrapolated) contours of the blue display.

Figure 10.4 illustrates the action of surface interpolation. Some of the circles in the display, such as the yellow ones, appear as spots on the surface. In contrast, most of the blue circles appear to be part of a single, occluded, blue figure, visible through holes. The white spots also appear to be holes rather than spots; through them, the white background surface is seen. These perceptual experiences arise from the surface interpolation process. Visible regions are connected across gaps in the input based on the similarity of their surface qualities (e.g., lightness, color, and texture). These connections cannot be given by contour interpolation, as the circles have no contour junctions. Certain rules govern surface interpolation; for example, it is confined by real and interpolated edges (Yin, Kellman, and Shipley, 2000). In the figure, note that the rightmost circle does not link up with the occluded object. This result occurs because that dot does not fall within real or interpolated contours of the blue object. Whereas contour interpolation processes are relatively insensitive to relations of lightness or color, the surface process depends crucially on these. Notice that the yellow dot on the lower left does not appear as part of the occluded object, despite being within the interpolated and real contours of the blue object.

This phenomenon of surface interpolation under occlusion appears to be one of a family of surface spreading or “filling-in” phenomena, such as the color-spreading phenomena studied by Yarbus (1967) and filling-in across the blind spot.

A Model of Contour Interpolation in Static 2D Scenes

Complementary processes of contour and surface interpolation work in concert to connect object fragments across gaps in the retinal image and recover the shape of occluded objects (e.g., Grossberg and Mingolla, 1985; Kellman and Shipley, 1991). Interpolated boundaries of objects, whether occluded or illusory, constrain spreading of surfaces across unspecified regions in the image, even if the interpolated boundaries are not connected to others (Yin, Kellman, and Shipley, 1997, 2000). Here, we briefly review the context for developing a 4D model of contour interpolation and surface perception.

The Geometry of Visual Interpolation

A primary question in understanding visual object and surface formation is what stimulus relationships cause it to occur? Answering this question is fundamental in several respects. It allows us to understand the nature of visual interpolation. Some visible fragments get connected, whereas others do not. Discovering the geometric relations and related stimulus conditions that lead to object formation is analogous to understanding the grammar of a language (e.g., what constitutes a well-formed sentence). Understanding at this level is also crucial for appreciating the deepest links between the physical world and our mental representations of it. Characterizing the stimulus relations leading to object formation is at first descriptive, but as unifying principles are revealed, they help us to relate the information used by the visual system to the physical laws governing the projection of surfaces to the eyes, whether these are deep constraints about the way the world works (e.g., Gibson, 1979; Marr, 1982) or scene statistics (e.g., Geisler et al., 2001).

Initiating Conditions for Interpolation

An important fact about contour interpolation is that the locations of interpolated contours are highly restricted in visual scenes. In general, interpolated contours begin and end at junctions or corners in visible contours (tangent discontinuities)—locations at which contours have no unique orientation (Shipley and Kellman, 1990; Rubin, 2001). Some have suggested that second-order discontinuities (points that are first-order continuous but mark a change in curvature) might also weakly trigger interpolation (Shipley and Kellman, 1990; Albert and Hoffman, 2000; Albert and Tse, 2000; Albert, 2001; for recent discussion see Kellman, Garrigan, and Shipley, 2005). Tangent discontinuities arise from the optics of how occluded objects project to the eyes: it can be proven that the optical projection of one object occluding another will contain these image features (Kellman & Shipley, 1991). Shipley and Kellman (1990) observed that, in general, interpolated contours begin and end at tangent discontinuities and showed that their removal eliminated or markedly reduced contour interpolation. Heitger et al. (1992) called tangent discontinuities “key points” and proposed a neurally plausible model for their extraction from images. The presence or absence of tangent discontinuities can be manipulated in illusory contour images by rounding the corners of inducing elements, which weakens contour interpolation (e.g., Albert and Hoffman, 2000; Kellman et al., 2005; Shipley and Kellman, 1990; Palmer, Kellman, and Shipley, 2006).

Contour Relatability

What determines which visible contour fragments get connected to form objects? Although tangent discontinuities are ordinarily necessary conditions for contour interpolation, they are not sufficient. After all, many corners in images are corners of objects, not points at which some contour passes behind an intervening surface (or in front, as in illusory contours).

Contour interpolation depends crucially on geometric relations of visible contour fragments, specifically the relative positions and orientations of pairs of edges leading into points of tangent discontinuity. These relations have been described formally in terms of contour relatability (Kellman and Shipley, 1991; Singh and Hoffman, 1999a). Relatability is a mathematical notion that defines a categorical distinction between edges that can connect by interpolation and those that cannot (see Kellman and Shipley, 1991, 175-177). The key idea in contour relatability is smoothness (e.g., interpolated contours are differentiable at least once), but it also incorporates monotonicity (interpolated contours bend in only one direction), and a 90° limit (interpolated contours bend through no more than 90°).

FIGURE 10.5 Contour relatability describes formally a categorical distinction between edges that can be connected by visual interpolation and those that cannot. (a) Geometric construction defining contour relatability (see text). (b) Alternative expression of relatability. Given one visible contour fragment terminating in a contour junction at (0,0) and having orientation 0°, those orientations 6 that satisfy the equation tan1 (y/x) < 6 < 0/2 are relatable. In the diagram, these are shown with solid lines, whereas nonre-latable orientations are shown with dotted lines.

Figure 10.5 shows a construction that is useful in defining contour relatability. Formally, if E1 and E2 are surface edges, and R and r are perpendicular to these edges at points of tangent discontinuity, then E1 and E2 are relatable if and only if

Although the precise shape of interpolated contours is a matter of some disagreement, there are two properties of relatability that cohere naturally with a particular class of contour shapes. First, it can be shown that interpolated edges meeting the relatability criteria can always be comprised of one constant curvature segment and one zero curvature segment. Second, it appears that this shape of interpolated edges has the property of being a minimum curvature solution in that it has lowest maximum curvature: any other firstorder continuous curve will have at least one point of greater curvature (see Skeath, 1991, in Kellman and Shipley, 1991). This is a slightly different minimum curvature notion than minimum energy.