Geoscience Reference
In-Depth Information
MLE). The complete proof is beyond the scope of this topic and can be found in standard
textbooks on EM. See the references in the bibliographical notes.
￿ The local optimum to which EM converges depends on the initial parameter θ ( 0 ) . A common
choice of θ ( 0 ) is the MLE on the small labeled training data.
The above general EM algorithm needs to be specialized for specific generative models.
Example 3.4. EM for a 2-class GMM with Hidden Variables We illustrate the EM algorithm
on a simple GMM generative model. In this special case, the observed data is simply a sample of
labeled and unlabeled instances. The hidden variables are the labels of the unlabeled instances. We
learn the parameters of two Gaussians to fit the data using EM as follows:
Algorithm 3.5. EM for GMM.
Input: observed data
D ={ ( x 1 ,y 1 ),...,( x l ,y l ), x l + 1 ,..., x l + u }
π ( 0 )
j
( 0 )
j
, ( 0 )
j
0 and θ ( 0 )
1. Initialize t
=
={
} j ∈{ 1 , 2 } to the MLE estimated from labeled data.
D | θ) converges:
3. E-step: For all unlabeled instances i
2. Repeat until the log likelihood log p(
∈{
l
+
1 ,...,l
+
u
}
,j
∈{
1 , 2
}
, compute
π (t j N
( x i ; μ (t)
, (t)
)
j
j
x i (t) )
γ ij
p(y j |
=
.
(3.15)
k = 1 π (t k N
( x i ; μ (t)
, (t)
)
k
k
1 if y i = j , and 0 otherwise.
4. M-step: Find θ (t + 1 ) using the current γ ij .Forj ∈{ 1 , 2 }
For labeled instances, define γ ij
=
,
l + u
=
l j
γ ij
(3.16)
i = 1
l + u
1
l j
μ (t + 1 )
=
γ ij x i
(3.17)
j
i = 1
l + u
1
l j
(t + 1 )
γ ij ( x i μ (t + 1 )
)( x i μ (t + 1 )
)
=
(3.18)
j
j
j
i = 1
l j
l + u
π (t + 1 )
j
=
(3.19)
5. t = t + 1
Output:
{
π j j , j } j ={ 1 , 2 }
Note that the algorithm begins by finding the MLE of the labeled data alone. The E-step then
computes the γ values, which can be thought of as soft assignments of class labels to instances. These
are sometimes referred to as “responsibilities” (i.e., class 1 is responsible for x i with probability γ i 1 ).
 
Search WWH ::




Custom Search