Mixture Models and EM - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

be an instance. We are interested in predicting its class label y . We will

employ a probabilistic approach that seeks the label that maximizes the conditional probability

p(y

Formally, let x

∈ X

x ) . This conditional probability specifies how likely each class label is, given the instance. By

definition, p(y |

for all y , and y p(y |

x ) ∈[

]

x ) =

1. If we want to minimize classification error,

the best strategy is to always classify x into the most likely class

0 , 1

y : 1

y = argmax

p(y |

x ).

(3.1)

Note that if different types of misclassification (e.g., wrongly classifying a benign tumor as malignant

vs. the other way around) incur different amounts of loss , the above strategy may not be optimal in

terms of minimizing the expected loss. We defer the discussion of loss minimization to later chapters,

but note that it is straightforward to handle loss in probabilistic models.

How do we compute p(y |

x ) ? One approach is to use a generative model , which employs the

Bayes rule:

y)p(y)

y p( x

p( x

p(y |

x ) =

| y )p(y ) ,

(3.2)

where the summation in the denominator is over all class labels y . p( x

| y) is called the class conditional

probability , and p(y) the prior probability . It is useful to illustrate these probability notations using

the alien gender example:

For a specific alien, x is the (weight, height) feature vector, and p(y

x ) is a probability distri-

bution over two outcomes: male or female. That is, p(y =

male

x ) + p(y =

female

x ) =

There are infinitely many p(y |

x ) distributions, one for each feature vector x .

| y =

There are only two class conditional distributions: p( x

female ) . Each

is a continuous (e.g., Gaussian) distribution over feature vectors. In other words, some weight

and height combinations are more likely than others for each gender, and p( x

male ) and p( x

| y) specifies

these differences.

The prior probabilities p(y =

male ) and p(y =

female ) specify the proportions of males and

females in the alien population.

Furthermore, one can hypothetically “generate” i.i.d. instance-label pairs ( x ,y) from these

probability distributions by repeating the following two steps, hence the name generative model: 2

1. Sample y

p(y) . In the alien example, one can think of p(y) as the probability of heads of

a biased coin. Flipping the coin then selects a gender.

∼

2. Sample x

| y) . In the alien example, this samples a two-dimensional feature vector to

describe an alien of the gender chosen in step 1.

∼ p( x

1 Note that a “hat” on a variable (e.g.,

y , θ ) indicates we are referring to an estimated or predicted value.

2 An alternative to generative models are discriminative models, which focus on distinguishing the classes without worrying about

the process underlying the data generation.

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home