Mixture Models and EM - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

where l j is the number of instances in class j . Note that we find β

l by substituting (3.9) into

(3.8) and rearranging. In the end, we see that the MLE for the class prior is simply the fraction of

instances in each class. We next solve for the class means. For this we need to use the derivatives

with respect to the vector μ j . In general, let v be a vector and A a square matrix of the appropriate

size, we have

∂v v Av

∂

2 Av . This leads to

∂μ j

∂

∂μ j

∂

2 ( x i −

μ j ) − 1

i : y i = j −

( x i −

μ j )

l j

i : y i = j

− 1

( x i −

μ j )

⇒

μ j

x i .

(3.10)

i : y i = j

We see that the MLE for each class mean is simply the class's sample mean. Finally, the MLE

solution for the covariance matrices is

l j

i : y i = j

( x i − μ j )( x i − μ j ) ,

(3.11)

which is the sample covariance for the instances of that class.

3.2 MIXTURE MODELS FOR SEMI-SUPERVISED CLASSIFI-

CATION

In semi-supervised learning,

consists of both labeled and unlabeled data. The likelihood depends

on both the labeled and unlabeled data—this is how unlabeled data might help semi-supervised

learning in mixture models. It is no longer possible to solve the MLE analytically. However, as we

will see in the next section, one can find a local maximum of the parameter estimate using an iterative

procedure known as the EM algorithm.

Since

the

training

data

consists

both

labeled

and

unlabeled

data,

i.e.,

D =

{ ( x 1 ,y 1 ),...,( x l ,y l ), x l + 1 ,..., x l + u }

, the log likelihood function is now defined as

⎛

⎞

l + u

⎝

⎠

D |

log p(

θ)

log

p( x i ,y i |

θ)

p( x i |

θ)

(3.12)

l + u

log p(y i | θ)p( x i | y i ,θ) +

log p( x i | θ).

(3.13)

i = 1

i = l + 1

The essential difference between this semi-supervised log likelihood (3.13) and the previous super-

vised log likelihood (3.6) is the second term for unlabeled instances. We call p( x

| θ) the marginal

probability , which is defined as

| θ) =

p( x ,y | θ) =

p(y | θ)p( x

| y,θ).

p( x

(3.14)

y = 1

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home