Digital Signal Processing Reference
In-Depth Information
document. These segments captured the spatial relationships among visual words.
Some good segments are sifted from bad ones for each discovered object class.
Verbeek et al. [
52
] proposed two aspect-based spatial field models by combin-
ing pLSA/LDA with Markov Random Fields (MRF). One is based on averaging
over forests of minimal spanning trees linking neighboring image regions. A tree-
structure prior is imposed to the object class labels
Z
j
=
{
z
ji
}
of image patches in
image
j
,
exp
∑
i
ψ
(
z
ji
,
z
j
χ
(
i
)
)+
log θ
j
P
(
Z
j
)
∝
,
(3.11)
where
χ
(
i
)
is the unique parent of patch
i
in the tree, and
ψ
(
z
ji
,
z
j
χ
(
i
)
)
is a pair-wise
potential,
˙
ψ
(
z
ji
,
z
j
χ
(
i
)
)=
ρ
[
z
ji
=
z
j
χ
(
i
)
]
.
(3.12)
The other model applies an efficient chain-based Expectation Propagation
method for regular 8-neighbor Markov Random Fields. The prior over
Z
j
is given by
exp
i
∼
i
ψ
(
z
ji
,
z
ji
)+
log θ
j
P
(
Z
j
)
∝
,
(3.13)
i
enumerates spatial neighbor patches
i
,
i
in image
j
. MRF captures the
local spatial dependence of image patches. These two models were trained using
either patch-level labels or image-level labels. Tested on 240 images of nine object
categories from the MSRC data set, when trained using patch-level labels, they
achieved object segmentation accuracy of 80
where
i
∼
.
2% and when trained using image-
level labels, the accuracy of 78
.
1% was achieved. The accuracies of pLSA were
78
0% respectively under these two settings. The similar idea was also
explored in [
58
] and a Dirichlet process mixture was introduced to automatically
learn the number of object classes from data. This framework was extended to
Conditional Random Field (CRF) [
4
] to integrated both local and global features in
the images [
53
,
59
].
Sudderth et al. [
60
] proposed a Transformed Dirichlet Process (TDP) model
to jointly solve the problem of scene classification and object segmentation. This
approach coupled topic models with spatial transformations and consistently ac-
counted for geometric constraints. The spatial relationships of different parts of
objects were explicitly modeled under a hierarchical Bayesian model. Cao et al.
[
61
] proposed a Spatially Coherent Latent Topic Model (Spatial-LTM) to simulta-
neously classify scene categories and segment objects. It oversegmented images into
regions of coherent latent topic model and coherent latent topic model was consid-
ered as visual words. It enforced the spatial coherency of the model by requiring that
only one single latent-topic was assigned to the image patches within each region.
.
5% and 74
.