Biomedical Engineering Reference
In-Depth Information
This function L (λ| O ) is called the likelihood of the parameters given the data, or just the likelihood
function. In the maximum likelihood problem, our goal is to find the λ that maximizes L , that is,
find λ* such that
*
λ
=
arg
max L O
(
λ
|
)
(5.33)
λ
Often, we maximize log ( L (λ| O )) because it is analytically easier. The EM algorithm is particularly
useful in situations where one seeks to maximize the likelihood function and the formulation has
hidden parameters, which creates uncertainty. Therefore, the observed data do not tell the full story
about the problem, and we assume that a complete data set exists Z = ( O , X ), where X takes into
consideration the hidden variables, and specify a joint density function
p ( z| λ) = p ( O, X| λ) = p ( X|O, λ) p ( O| λ)
(5.34)
From this joint density, we can define the complete data likelihood
L |Z ) = L |O, X ) = p ( O, X| λ)
(5.35)
Note that L |O , X ) = h Λ,Ο ( X ) where h Λ,Ο (.) is some function, λ and O are constants, and X is a
random variable (the missing information due to the hidden variable). The original likelihood is
referred to as incomplete data likelihood.
The EM algorithm is an iterative algorithm with two steps. Let us call the optimization func-
tion we wish to maximize Q , λ ( i - 1 ) ), where λ stands for the new optimized parameters and λ ( i - 1)
the actual parameter set at iteration i - 1. First (expectation step), EM finds the expected value of
the complete data likelihood log p ( O , X| λ) with respect to the unknown data X given the observed
data O and the current parameter estimates.
(
1
)
(
-
1
)
(
-
1) )d X
i
-
i
i
(5.36)
Q
( ,
λ λ
)
=
E
[log
p O X
(
,
|
λ λ
)|
O
,
]
=
log
p O X
(
,
|
λ
)
f X O
(
|
,
λ
where f ( X|O, λ ( i - 1 ) ) is the marginal distribution of the unobserved data, which is dependent on the
observed data and current parameters.
The second step (M step) of the EM algorithm maximizes the expectation computed in the
first step, that is,
(5.37)
( )
i
(
i
-1
)
λ
=
arg
max Q
(
λ
|
λ
)
λ
The beauty of this algorithm is that it has been shown to increase the log likelihood at each
step, which means that convergence to a local maximum is guaranteed [ 40 ].
 
Search WWH ::




Custom Search