Mixture Models and EM - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

i = 1 , the MLE is usually easy to find. We can

In supervised learning when

D ={

( x i ,y i )

}

rewrite the log likelihood as

D | θ) = log

log p(

p( x i ,y i | θ) =

log p(y i | θ)p( x i | y i ,θ),

(3.6)

i = 1

where we used the fact that the probability of a set of i.i.d. events is the product of individual

probabilities. Finding an MLE is an optimization problem to maximize the log likelihood. In

supervised learning, the optimization problem is often straightforward and yields intuitive MLE

solutions, as the next example shows.

Example 3.2. MLE for Gaussian Mixture Model, All Labeled Data We now present the

derivation for the maximum likelihood estimate for a 2-class Gaussian mixture model when

D ={ ( x i ,y i ) }

i = 1 . We begin by setting up the constrained optimization problem

argmax

log p(

D |

θ)

s.t.

p(y j |

θ)

1 ,

(3.7)

j = 1

where we enforce the constraint that the class priors must sum to 1. We next introduce a Lagrange

multiplier β to form the Lagrangian (see [ 99 ] for a tutorial on Lagrange multipliers)

D | θ) − β(

p(y j | θ) − 1 )

(θ, β)

log p(

j = 1

log

p( x i ,y i |

θ)

−

β(

p(y j |

θ)

−

1 )

i =

j =

log p(y i | θ)p( x i | y i ,θ) − β(

p(y j | θ) − 1 )

i = 1

j = 1

log π i +

log

( x i ;

μ y i , y i )

−

β(

π j −

1 ),

i =

j =

where π j ,μ j , j for j ∈{ 1 , 2 }

are the class priors and Gaussian means and covariance matrices. We

compute the partial derivatives with respect to all the parameters. We then set each partial derivative

to zero to obtain the intuitive closed-form MLE solution:

∂

∂β =

π j −

⇒

π j

1 .

(3.8)

Clearly, the β Lagrange multiplier's role is to enforce the normalization constraint on the class priors.

∂

∂π j =

π j − β =

l j

π j − β = 0 ⇒ π j

l j

β =

l j

l ,

(3.9)

i : y i = j

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home