Information Technology Reference
In-Depth Information
question of interest is to recover an unknown image z
, interpreted as a
classification into a finite number K of labels, from an image y of observed in-
tensity values. This classification usually requires values for a vector parameter
θ
∈Z
Θ considered in a Bayesian setting as a random variable. The idea is to
approach the problem by breaking it into the three primary stages mentioned
above. The first data stage is concerned with the observational process or data
model p ( y
z ), which specifies the distribution of the data y given the pro-
cess of interest and relevant parameters. The second stage then describes the
process model p ( z |
|
θ ), conditional on usually other parameters still denoted by
θ for simplicity. Finally, the last stage accounts for the uncertainty in the pa-
rameters through a distribution p ( θ ). In applications, each of these stages may
have multiple sub-stages. For example, if spatial interactions are to be taken into
account, it might be modeled as a product of several conditional distributions
suggested by neighborhood relationships. Similar decompositions are possible in
the parameter stage.
Ultimately, we are interested in the distribution of the process and parameters
updated by the data, that is the so-called posterior distribution p ( z
y ). Due
to generally too complex dependencies, it is dicult to extract parameters θ from
the observed data y without explicit knowledge of the unknown true segmenta-
tion z . This problem is greatly simplified when the solution is determined within
an EM framework. The EM algorithm [11] is a general technique for finding
maximum likelihood solutions in the presence of missing data. It consists in two
steps usually described as the E-step in which the expectation of the so-called
complete log-likelihood is computed and the M-step in which this expectation
is maximized over θ . An equivalent way to define EM is the following. Let
|
D
be the set of all probability distributions on
.Asdiscussedin[6],EMcanbe
viewed as an alternating maximization procedure of a function F defined, for
any probability distribution q
Z
∈D
,by
F ( q, θ )=
z ∈Z
ln p ( y , z
|
θ ) q ( z )+ I [ q ] ,
(1)
where I [ q ]=
E q [log q ( Z )] is the entropy of q ( E q denotes the expectation
with regard to q and we use capital letters to indicate random variables while
their realizations are denoted with small letters). When prior knowledge on the
parameters is available, the Bayesian setting consists in replacing the maximum
likelihood estimation by a maximum a posteriori (MAP) estimation of θ using the
prior knowledge encoded in distribution p ( θ ). The maximum likelihood estimate
of θ ie.
θ =argmax θ∈Θ p ( y
θ =argmax θ∈Θ p ( θ
y ). The EM
algorithm can also be used to maximize the posterior distribution. Indeed, the
likelihood p ( y
|
θ ) is replaced by
|
θ )= F ( q, θ )+ KL ( q, p )
where KL ( q, p ) is the Kullback-Leibler divergence between q and the conditional
distribution p ( z
|
θ )and F ( q, θ ) are linked through log p ( y
|
|
y ) and is non-negative,
q ( z )log
KL ( q, p )=
z ∈Z
q ( z )
p ( z
.
|
y )
 
Search WWH ::




Custom Search