3D and Spatiotemporal Interpolation in Object and Surface Formation (Computer Vision) Part 2

One Object or Two?

Relatability defines a categorical distinction—which relative positions and orientations allow edges to be connected by contour interpolation. Such a distinction is important, as object perception often involves a discrete determination of whether two visible fragments are part of the same object or not. Figure 10.6 shows examples of relatable and nonrelatable edges, in both perception of partly occluded objects and perception of illusory objects. Complete objects are formed in the top row but not in the bottom row. Object formation has profound effects on further processing, such as generation of a representation of missing areas, generation of an overall shape description, and comparison with items or categories in memory. Research indicates that the representation of visual areas as part of a single object or different objects has many important effects on information processing (Baylis and Driver, 1993; Zemel et al., 2002; Kellman, Garrigan, and Shipley, 2005).

FIGURE 10.6 Examples of relatable and nonrelatable contours.

Quantitative Variation

Although the discrete classification of visible areas as connected or separate is important, there is also reason to believe that quantitative variation exists within the category of relatable edges (Kellman and Shipley, 1991; Banton and Levi, 1992; Shipley and Kellman, 1992a, 1992b; Field, Hayes, and Hess, 1993; Singh and Hoffman, 1999b). For example, experiments indicating a decline to a limit around 90° were reported by Field, Hayes, and Hess (1993). Singh and Hoffman (1999a) proposed an expression for quantitative decline of relatability with angular change.

Ecological Foundations

The notion of relatability is sometimes described as a formalization of the Gestalt principle of good continuation (Wertheimer, 1923/1938). Recent work suggests that good continuation and relatability are separate but related principles of perceptual organization (Kellman et al., 2003). Both embody underlying assumptions about contour smoothness (Marr, 1982), but they take different inputs and have different constraints. The smoothness assumptions related to both of these principles reflect important aspects of the physical world as it projects to the eyes. Studies of image statistics suggest that these principles approach optimality in matching the structure of actual contours in the world. Through an analysis of contour relationships in natural images, Geisler et al. (2001) found that the statistical regularities governing the probability of two edge elements cooccurring correlate highly with the geometry of relatability. Two visible edge segments associated with the same contour meet the mathematical relat-ability criterion far more often than not.

3D Contour Interpolation

Object formation processes are three-dimensional. Figure 10.7 gives an example—a stereogram that may be free-fused by crossing the eyes. One sees a vivid transparent surface with a definite 3D shape. Object formation takes as inputs 3D positions and orientations of edges and produces as outputs 3D structures (Kellman and Shipley, 1991; Carman and Welch, 1992; Kellman et al., 2005; Kellman, Garrigan, and Shipley, 2005).

Until recently, there has been no account of the stimulus conditions that produce 3D interpolation. Kellman et al. (2005) proposed that 3D interpolation might be governed by a straightforward 3D generalization of 2D relatability. As in the 2D case, interpolated contours between 3D edges must be smooth, monotonic, and bend no more than 90°. Similarly, where 3D interpolated contours meet physically given edges, the orientations of the physically given part and the interpolated part must match.

FIGURE 10.7 Three-dimensional (3D) interpolation. The display is a stereogram that may be free-fused by crossing the eyes. Specification of input edges’ positions and orientations in 3D space (here given by stereoscopic disparity) leads to creation of a vivid, connected, transparent surface bending in depth.

Formally, we define, for a given edge and any arbitrary point, the range of orientations that fall within the limits of relatability at that point. In the Cartesian coordinate system, let 0 be an angle in the x-y plane, and f an angle in the x-z plane (for simplicity, in both cases zero degrees is the orientation parallel to the x-axis). Positioning one edge with orientation 0 = f = 0 and ending at the point (0, 0, 0), and positioning a second edge at (x,y,z) somewhere in the volume with x > 0, the range of possible orientations (6, f) for 3D-relatable edges terminating at that point are given by

and

As in the 2D case, we would expect quantitative variation in the strength of interpolation within these limits. The lower bounds of these equations express the absolute orientation difference (180° for two collinear edges ending in opposite directions) between the reference edge (edge at the origin) and an edge ending at the arbitrary point oriented so that its linear extension intersects the tip of the reference edge. The upper bounds incorporate the 90° constraint in three dimensions.

How might the categorical limits implied by the formal definition of 3D relatability be realized in neural architecture? In 2D cases, it has been suggested that interpolation occurs through lateral connections among contrast-sensitive oriented units having particular relations (Field, Hayes, and Hess, 1993; Yen and Finkel, 1998). Analogously, 3D relatability specifies a “relatability field” or volume within which relatable contour edges can be located. At every location in the volume, relatable contours must have a 3D orientation within a particular range specific to that location.

The interpolation field suggests that, contradictory to some 2D models of contour interpolation, early visual cortical areas that do not explicitly code 3D positions and contour orientations may be insufficient for the neural implementation of 3D contour interpolation (Kellman, Garrigan, and Shipley, 2005). As we discuss below, there are interesting considerations regarding exactly where the neural locus of contour interpolation may be.

Experimental Studies of Contour Interpolation

An objective performance paradigm for testing 3D contour relatability was devised by Kellman, Garrigan, Shipley, Yin, and Machado (2005) and is illustrated in Figure 10.8. In their experiments, subjects were shown stereoscopically presented 3D planes whose edges were either relatable or not. Examples of relatable and nonrelatable pairs of planes are shown in the columns of Figure 10.8. Orthogonal to relatability are two classes of stimuli, converging and parallel planes, shown in the rows of Figure 10.8. In these experiments, subjects were asked to classify stimuli like the ones shown as either parallel or converging. The idea is that, to the extent that 3D relatability leads to object formation, judging the relative orientations of 3D relatable planes should be easier than judging the relative orientations of 3D nonrelatable planes.

Kellman et al. (2005) found that subjects could make this classification more accurately and quickly when the planes were 3D relatable. This result is consistent with 3D relatability as a description of the geometric limits of 3D contour interpolation and object formation. A variety of other experiments indicated that the results depended on 3D interpolation, rather than some other variable, such as an advantage of certain geometric positions for making slant comparisons. (For details, see Kellman et al., 2005.)

FIGURE 10.8 Experimental stimuli used to test three-dimensional (3D) object formation from 3D relatability. It was predicted that sensitivity and speed in classifying displays like these as either converging or parallel would be superior for displays in which unitary objects were formed across the gaps by contour interpolation, and that object formation would be constrained by 3D relatability. Both predictions were confirmed experimentally (Kellman et al., 2005).

3D Surface Interpolation

Contour and surface processes often work in complementary fashion (Grossberg and Mingolla, 1985; Nakayama, Shimojo, and Silverman, 1989; Yin, Kellman, and Shipley, 1997, 2000; Kellman, Garrigan, and Shipley, 2005). Studies with 2D displays have shown that surface interpolation alone can link areas under occlusion based on similarity of surface quality. Surface similarity may be especially important in 2D, because all visible surface regions are confined to the same plane. In 3D, the situation is different. Here, geometric positions and orientations of visible surface patches may also be relevant.

We have recently been studying whether 3D surface interpolation depends on geometric constraints and, if so, how these relate to the constraints that determine contour interpolation. To study 3D surface interpolation apart from contour processes, we use visible surface patches that have no oriented edges. These are viewed through apertures (Figure 10.9).

FIGURE 10.9 Use of the parallel/converging method for studying threedimensional (3D) surface interpolation. A fixation point is followed by a display in which surface patches slanted in depth are viewed through two apertures. Participants make a forced choice as to whether the visible surface patches were in parallel or converging planes.

We used a version of the parallel/converging method to study 3D surface interpolation. Displays were made of dot-texture surfaces; due to their lack of oriented edges, these surface patches could not support contour interpolation. Participants made a forced choice on each trial as to whether two surface patches, visible through apertures, lay in parallel or converging (intersecting) planes. As in 3D contour interpolation, we hypothesized that completion of a connected surface behind the occluder would facilitate accuracy and speed on this task. We also tested whether 3D relatability—applied to the orientations of surface patches rather than contours—might determine which patches were seen, and processed, as connected. 3D relatable patches were compared to displays in which one patch or the other was shifted to disrupt 3D relatability.

Figure 10.10 shows representative data on 3D surface interpolation (Fantoni et al., 2008). As predicted, 3D relatable surface patches showed sensitivity and speed advantages over nonrelatable surface patches. This effect was just as strong for vertically misaligned apertures as for vertically aligned ones. Consistent with a 90° constraint, the difference between 3D relatable and nonrelatable conditions decreased as the slant of each patch approached 45° (making their relative angle approach 90°). Many questions remain to be investigated, but these results suggest the fascinating possibility that both contour and surface interpolation in 3D share a common geometry (cf, Grimson, 1981). They may even be manifestations of some common process, although Kellman et al. (2005) showed that contour interpolation, not surface interpolation, was specifically implicated in their results.

The results of experiments on 3D surface interpolation support the notion that surface-based processes can operate independently of contour information, and that these processes are geometrically constrained by the 3D positions and orientations of visible surface patches. The pattern of results substantially replicates that of Kellman et al. (2005) for illusory contour displays, despite the lack of explicit bounding edges in the inducing surfaces. 3D relatability consistently affected speeded classification performance, by facilitating it for 3D relatable displays relative to displays in which 3D relatability was disrupted by both a depth shift of one surface relative to the other (violating the monotonicity constraint) and large values of relative stereo slant (violating the 90° constraint in converging displays).

FIGURE 10.10 Three dimensional (3D) surface interpolation data. Sensitivity (upper panels) and response times (lower panels) for 3D relatable and 3D nonrelatable surface patches in aligned (right) and misaligned (left) aperture configurations.

2D surface interpolation may constitute a special case of a more general 3D process. In 3D, the primary determinant of interpolation may be geometric relations, not similarity of surface quality. In our displays, position and orientation of surface patches seen through apertures were specified by binocular disparity, along with information from vergence. It appears that disparity provided sufficient information for the extraction of the 3D orientation of inducing patches necessary to constrain surface interpolation. The evidence suggests that contour and surface processes that surmount gaps in 3D are separable processes but rely on common geometric constraints.