Mixture Models and EM - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

MLE). The complete proof is beyond the scope of this topic and can be found in standard

textbooks on EM. See the references in the bibliographical notes.

The local optimum to which EM converges depends on the initial parameter θ ( 0 ) . A common

choice of θ ( 0 ) is the MLE on the small labeled training data.

The above general EM algorithm needs to be specialized for specific generative models.

Example 3.4. EM for a 2-class GMM with Hidden Variables We illustrate the EM algorithm

on a simple GMM generative model. In this special case, the observed data is simply a sample of

labeled and unlabeled instances. The hidden variables are the labels of the unlabeled instances. We

learn the parameters of two Gaussians to fit the data using EM as follows:

Algorithm 3.5. EM for GMM.

Input: observed data

D ={ ( x 1 ,y 1 ),...,( x l ,y l ), x l + 1 ,..., x l + u }

π ( 0 )

,μ ( 0 )

, ( 0 )

0 and θ ( 0 )

1. Initialize t

} j ∈{ 1 , 2 } to the MLE estimated from labeled data.

D | θ) converges:

3. E-step: For all unlabeled instances i

2. Repeat until the log likelihood log p(

∈{

1 ,...,l

}

∈{

1 , 2

}

, compute

π (t j N

( x i ; μ (t)

, (t)

)

x i ,θ (t) )

γ ij

≡

p(y j |

(3.15)

k = 1 π (t k N

( x i ; μ (t)

, (t)

)

1 if y i = j , and 0 otherwise.

4. M-step: Find θ (t + 1 ) using the current γ ij .Forj ∈{ 1 , 2 }

For labeled instances, define γ ij

l + u

l j

γ ij

(3.16)

i = 1

l + u

l j

μ (t + 1 )

γ ij x i

(3.17)

i = 1

l + u

l j

(t + 1 )

γ ij ( x i − μ (t + 1 )

)( x i − μ (t + 1 )

)

(3.18)

i = 1

l j

l + u

π (t + 1 )

(3.19)

5. t = t + 1

Output:

{

π j ,μ j , j } j ={ 1 , 2 }

Note that the algorithm begins by finding the MLE of the labeled data alone. The E-step then

computes the γ values, which can be thought of as soft assignments of class labels to instances. These

are sometimes referred to as “responsibilities” (i.e., class 1 is responsible for x i with probability γ i 1 ).

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home