Digital Signal Processing Reference
In-Depth Information
object labels. Support Vector Machines (SVM) and Boosting are widely used to
model the appearance of object classes. Marsalek and Schmid [ 35 ] estimated the
shape mask of an object and its object category using nonlinear SVM and with
2
distance. The appearance of the object within the shape mask was represented by
a histogram of visual words. Shotton et al. [ 32 ] used the texton histograms and re-
gion priors, which were calculated from their proposed semantic texton forests, of
image regions as input of a one-vs-others SVM classifier to assign image regions
into different object classes. Gould et al. [ 36 ] used the boosting classifier to predict
the label of each pixel. Tahir et al. [ 25 ] used Spectral Regression Kernel Discrim-
inant Analysis(SRKDA) [ 37 ] and achieved better results than SVM on PASCAL
VOC 2008 [ 6 ]. It was also much more efficient than Kernel Discriminant Analysis
(KDA). Aldavert et al. [ 38 ] proposed an integral linear classifier, which used integral
images to efficiently calculate the outputs of linear classifiers based on histograms
of visual words at the pixel level.
χ
3.3.2
Conditional Random Fields
Although classifiers such as SVM and Boosting can predict the object label of a
pixel based on the appearance within its neighborhood, they cannot capture local
consistency of other contextual features, such as “sky” appears above buildings
but not the other way around. Local appearance, local consistency and contex-
tual features can be well incorporated under a Conditional Random Fields (CRF)
framework.
3.3.2.1
Multiscale Conditional Random Fields
He et al. [ 39 ] were the first to use CRF for semantic object segmentation. Their
proposed CRF framework is described as following. Suppose X
= {
x i
}
are image
patches and Z
are their object class labels. In [ 39 ], the conditional distribu-
tion over Z given by input X was defined by multiplicatively combining component
conditional distributions.
= {
z i
}
(
|
)
(
|
)
(
|
)
(
|
) .
P
Z
X
P C
Z
X
P R
Z
X
P G
Z
X
(3.1)
P C , P R ,and P G capture statistical structures at three different spatial scales: local
classifier, regional features, and global features (see Fig. 3.6 ).
The local classifier P C produces a distribution over the label z i given by its image
patch x i as input,
, λ )= i
P C (
Z
|
X
P C (
z i |
x i , λ ) ,
(3.2)
where
is the parameter of the local classifier. A 3-layer multilayer perceptron
(MLP) was used in [ 39 ].
λ
Search WWH ::




Custom Search