The following simple model (Craft et al., 2007) explains our findings on border ownership coding and might also help us to understand how surfaces are represented by neurons (Figure 8.9). It proposes that border ownership selectivity is produced by specific neural grouping circuits (“G cells”) that integrate contour signals of V2 neurons (“B cells”) with cocircular receptive fields, and, by feedback, adjust the gain of the B cells. In this scheme, each B cell is connected to grouping circuits integrating contours on one side of its RF, so that its gain is increased if a convex shape is present on that side. The model assumes G cells with integration fields of various sizes according to a scale pyramid. It predicts border ownership assignment correctly for the various conditions. B cells model border ownership selective neurons as recorded in V2. G cells are assumed to reside in a higher-level area such as V4. As yet, their existence is hypothetical.
FIGURE 8.9 Neural network model of border ownership coding. See text for further explanation.
One attraction of this model is that it describes not only figure-ground organization but can also be extended to explain how the resulting figure representations are accessed and selected by top-down attention. We assume that top-down attention excites the G cells at the focus of attention (Figure 8.10a,b). Because excitation of a G cell turns up the gain of the B cells that are connected with that G cell, the contour signals representing the figure are enhanced. The illustration shows how the occluding figure (Figure 8.10a) or the underlying figure (Figure 8.10b) can be selected. The latter cannot be accomplished with a spatial attention model, because it pools the occluding contour with the contours of the underlying figure (Figure 8.10c).
FIGURE 8.10 Explaining selective attention by the model of Figure 8.9. It is assumed that volitional (“top-down”) attention excites neurons in the G cell layer as illustrated by a yellow spotlight. In this model, attention enhances the correct contour segments, whether foreground (a) or background objects (b) are attended. In contrast, a spatial attention model extracts a mixture of contours from both foreground and background objects (c).
The model predicts that top-down attention should have an asymmetrical effect on the B cells: Because each B cell is connected only to G cells on one side (its preferred border ownership side), the responses of a B cell are enhanced only if the focus of attention is on that side. Experiments in which the monkey attended to a figure on one side or the other confirmed this prediction. The attention effect was asymmetrical, and the side of attentive enhancement was generally the same as the preferred side of border ownership (Qiu, Sugihara, and von der Heydt, 2007). Thus, the model accounts for three findings. It explains how the system uses image context to generate border-ownership signals, it explains the spatial asymmetry of the attention influence, and it explains why the side of attention enhancement is generally the same as the preferred side of border ownership.
We started out by considering the population response profile across the cortical representation of a uniform figure on a uniform background, noting that in both V1 and V2 the contours of the figure are strongly enhanced relative to the interior of the figure. Orientation-selective cells respond exclusively to the contours, whereas nonoriented cells are also activated by the uniform interior. One might expect that a surface attribute like color would be represented mainly by color-selective nonoriented cells, and that the shape of the figure would be represented by orientation-selective noncolor cells. However, this is not the way things are represented in the visual cortex. Orientation-selective cells are at least as color selective as nonoriented cells, and because of the preponderance of orientation-selective cells and the response enhancement at the edges, the color signals from the contours are much stronger than the color signals from the surface. This suggests that surface color is somehow computed from the contour signals.
This conclusion is further strengthened by the finding that most col-or-and-orientation cells signal the direction of the color gradient at the contour. Again, this is contrary to what is commonly assumed—namely, that selectivity for contrast polarity is found only in simple cells. This is clearly not the case in the primate visual cortex. Simple cells are much less frequent in V2 than in V1 (Levitt, Kiper, and Movshon, 1994), but selectivity for contrast polarity is undiminished (Friedman, Zhou, and von der Heydt, 2003).
We saw a similar transition of coding in the case of depth: In V2, 3D edge selectivity emerges, and cells respond to the contours of cyclopean figures. Here also cells are generally selective for edge polarity, which means that they signal the depth ordering of surfaces at the contour. This means that these neurons signal edges not only for the purpose of coding shape, for which position and orientation of edges would be sufficient, but also for representing the surface. Perhaps the 3D shape of the surface is also derived from the contour representation.
Border ownership coding is the string that ties a number of diverse findings together. It can help us to understand how the visual cortex imposes a structure on the incoming sensory information and how central mechanisms use this structure for selective processing. Border ownership is about relating contours to regions. The most interesting aspect is the global shape processing that is evident from the differential responses to displays that are identical within a large region around the receptive field (Figure 8.6a,b). It implies that the four sides of the square are assigned to a common region, which is a form of “binding,” the essence of creating object representations. In our model, we propose a mechanism of two reciprocal types of connectivity, a converging scheme for the integration of feature signals, and a diverging scheme of feedback connections that set the gain of the feature neurons. This gain change is what leads to the observed different firing rates for figures on preferred and nonpreferred sides. Besides this effect that is caused by the visual stimulus, the feedback connections serve in top-down attention for the selective enhancement of the feature signals (B cells) representing the attended object. A large proportion of V2 neurons shows this dual influence of stimulus-driven grouping and top-down attentive modulation (Qiu, Sugihara, and von der Heydt, 2007). Moreover, the neurons show an asymmetry of attentive modulation that is correlated with the side of border ownership preference, as predicted by the model.
Our model differs from many other models of perceptual organization in that it postulates a dedicated neural circuitry for the grouping. The grouping in our model is not the result of self-organization of activity within the cortical area (which would be limited by the length and conduction velocity of horizontal connections in the cortex), but is produced by an extra set of neurons, the G cells. We assume that these cells are not part of the pattern recognition pathway, but specifically serve to provide a structure onto which central processes can latch, forming an interface between feature representation and central cognitive mechanisms. This specificity of function is an important difference: The G cells with their relatively large receptive fields do not have the high resolution required for form recognition. Therefore, the system needs only a relatively small number of G cells to cover the visual space.
I suggest that this general scheme could also explain how the brain represents surfaces. A surface representation might correspond to a set of G cells that the system activates when a task requires specific surface information. Take the example of hiking in the mountains. Here, the visual system needs to provide information about the position of the object to be stepped on, such as a rock, and the surface normal at the point to be touched by the foot. The calculation of the surface normal would involve the activation of a cluster of G cells corresponding to the location of the rock, and these cells would then selectively enhance the signals representing the contours of the rock. The enhanced contour signals would then be collected by a “surface normal calculator” downstream in the pathway. Neurons that represent the 3D orientation of lines exist in V4 (Hinkle and Connor, 2002), and from the 3D orientations of the contour elements, the system could calculate the approximate orientation and curvature of the surface and its normal. A similar process can be conceived for the computation of surface color from contour signals. Thus, I envision surface representation more like a procedure the system can call when needed than a pattern of neural activity representing the surface points in space.