Cue Interpretation and Propagation: Flat versus Nonflat Visual Surfaces (Computer Vision) Part 2

Cue Interpretation For Nonflat Surfaces

So far we described how cues are integrated at a single location and how they can be propagated to different locations along a flat or slowly curving surface. In the general case of nonflat surfaces, we have to deal with the fact that visual cues for some surface property may also depend on surface shape. To give an example, consider a texture foreshortening cue for slant. In the case of nonflat surfaces, this cue depends on both surface slant and shape. Thus, in order to interpret this cue in terms of slant, the effect of shape should be disentangled from the effect of slant. Furthermore, if several cues are present at the same location on a nonflat surface, they are no longer independent due to the explaining away phenomenon. For instance, different cues for slant can mutually depend on a surface shape. This violates the assumptions of linear cue combination theory and potentially can make cue integration a difficult problem.

One of the solutions to this problem is given by a modular approach. Similar to the case of cue integration when each cue is processed independently by a separate module, we can assign separate modules for estimation of surface shape and slant. Then, in a slant module, for example, the information about shape would be represented with a fixed prior. A limited interaction would take place after estimation is finished, and modules exchange information in order to update their priors.


Another observation that supports a modular approach is that cues for shape can be quite complicated and specific. Consider, for example, a shading cue for shape. This cue depends on the vector of light source direction, light spectrum, surface reflectance properties, and so forth. It is difficult to imagine a mechanism for interaction between this specific cue and the cues for surface slant. On the other hand, if shape is represented in a cue independent form (such as surface curvature) at the output of a shape-processing module, it would be much easier to incorporate such a representation into the interpretation of slant cues. Thus, a simplicity principle suggests that surface shape should be estimated in isolation and used for the proper interpretation and integration of surface cues. Moreover, according to principles described in the second section, shape information can be used to propagate sparse cues along nonflat surfaces. In other words, estimation of shape can help to automatically interpolate surface cues in locations where they are sparse.

A perceptual mechanism for shape processing is yet to be studied, but we aim to establish whether or not the human visual system can use shape information to interpret surface cues properly. To investigate this issue, we conducted two psychophysical experiments that are described in detail in Ivanchenko (2006). These experiments used a slant discrimination task to analyze the role of shape cues on subject performance. The visual cues were shading and texture. Note that these cues are affected by both slant and shape of a surface. In other words, the cues for slant mutually depend on the surface shape and vice versa. Stimulus shape was represented with corrugated surfaces that were planar on a large scale and had a pattern of roughly vertical ridges on a small scale. Surfaces were rendered in three conditions: texture, shading, and combined cues (see Figure 5.3).


 Cue conditions on a slant discrimination task. Texture (left) was generated by a reaction-diffusion process and was statistically uniform. Shading (middle) was a mix of diffuse and ambient components with a point light source placed above the surface. A combined cue condition (right) had both cues. The surface shape was a mixture of two-dimensional Gaussians that were aligned on a grid and had a roughly vertical orientation.

Figure 5.3 Cue conditions on a slant discrimination task. Texture (left) was generated by a reaction-diffusion process and was statistically uniform. Shading (middle) was a mix of diffuse and ambient components with a point light source placed above the surface. A combined cue condition (right) had both cues. The surface shape was a mixture of two-dimensional Gaussians that were aligned on a grid and had a roughly vertical orientation.

Note that stimuli in shading condition have very weak cues for slant, and there is a bias toward a fronto-parallel interpretation. To reduce this bias, we used a slant discrimination task that included trials with two sequential presentations of a surface in the same condition. In each trial, subjects had to pick a surface that had a greater slant. The subjects’ performance was summarized with a psychometric function.

In order to understand how the visual system processes slant information, we compared subject performance with the performance that was expected from a linear cue combination. As we show below, a linear model assumes that shape is not estimated when estimating slant. Alternatively, the visual system can use the information about surface shape for slant estimation. To measure performance on a discrimination task, we calculated a threshold of a psychometric function. Assuming Gaussian noise, the threshold is proportional to the standard deviation of the noise. According to a linear rule, the thresholds in texture (otex) and shading (°shad) conditions can be used to calculate the expected linear threshold (olinear) in a combined condition:

tmpf103-134_thumb[2]

Note that such a rule is applicable only if the visual system ignores the shape (e.g., a planar approximation is used instead). If the visual system uses shape estimates to inform the estimation of slant, the linear rule is suboptimal. This is because the linear rule applied to slant discrimination describes the integration of two slant cues but ignores their interaction with shape cues. Optimally, in combined cue condition, the overall accuracy of shape estimation increases and can further improve slant estimation given that these processes interact. We found that a mean threshold in combined cue condition was significantly lower than the expected linear threshold (t-test, p = 0.046). Thus, subjects performed much better than was expected from a linear rule indicating that shape and slant estimation processes interact.

To get additional evidence about the role of shape cues in slant discrimination, we conducted a second experiment where we decoupled texture and shading cues. Specifically, we made the latter one uninformative for interpreting cues for slant. This was achieved by a horizontal shift of texture projection in the image. Despite such drastic manipulation, a cue conflict was hardly noticeable, and texture was still perceived attached to a visual surface depicted by shading. After this manipulation, the threshold in combined cue condition significantly increased compared to the one measured in the first experiment (t-test, p = 0.015). Because our stimulus manipulation created a cue conflict for shape cues but not for slant cues, we concluded that accurate shape information was crucial for interpretation of slant cues. Overall, the results of two psychophysical experiments suggest that using shape information improves subject performance on a slant discrimination task, and the absence of consistent shape information significantly deteriorates subject performance. We concluded that shape cues are used by the human visual system to properly interpret slant cues.

Conclusions

We found many similarities between a computational model of information propagation and underlying principles of cue integration. We also found that a cue conflict theory can be used to improve information propagation into areas with weak and noisy cues. Using this improvement, we obtained dense 3D surface reconstructions from a stereo pair with sparse strong cues. Importantly, the shape of 3D reconstruction depended on the form of a prior constraint that helped to interpolate information along flat or slowly curving surfaces. For nonflat surfaces, we showed that the human visual system uses shape information to increase the informativeness of visual cues for slant. This is consistent with the hypothesis that surface shape is estimated in isolation to facilitate interpretation of other cues and interpolate them in the location where cues are sparse.

Next post:

Previous post: