Geoscience Reference
In-Depth Information
be an instance. We are interested in predicting its class label y . We will
employ a probabilistic approach that seeks the label that maximizes the conditional probability
p(y
Formally, let x
X
x ) . This conditional probability specifies how likely each class label is, given the instance. By
definition, p(y |
|
for all y , and y p(y |
x ) ∈[
]
x ) =
1. If we want to minimize classification error,
the best strategy is to always classify x into the most likely class
0 , 1
y : 1
y = argmax
y
p(y |
x ).
(3.1)
Note that if different types of misclassification (e.g., wrongly classifying a benign tumor as malignant
vs. the other way around) incur different amounts of loss , the above strategy may not be optimal in
terms of minimizing the expected loss. We defer the discussion of loss minimization to later chapters,
but note that it is straightforward to handle loss in probabilistic models.
How do we compute p(y |
x ) ? One approach is to use a generative model , which employs the
Bayes rule:
y)p(y)
y p( x
p( x
|
p(y |
x ) =
| y )p(y ) ,
(3.2)
where the summation in the denominator is over all class labels y . p( x
| y) is called the class conditional
probability , and p(y) the prior probability . It is useful to illustrate these probability notations using
the alien gender example:
￿ For a specific alien, x is the (weight, height) feature vector, and p(y
|
x ) is a probability distri-
bution over two outcomes: male or female. That is, p(y =
male
|
x ) + p(y =
female
|
x ) =
1.
There are infinitely many p(y |
x ) distributions, one for each feature vector x .
| y =
| y =
￿ There are only two class conditional distributions: p( x
female ) . Each
is a continuous (e.g., Gaussian) distribution over feature vectors. In other words, some weight
and height combinations are more likely than others for each gender, and p( x
male ) and p( x
| y) specifies
these differences.
￿ The prior probabilities p(y =
male ) and p(y =
female ) specify the proportions of males and
females in the alien population.
Furthermore, one can hypothetically “generate” i.i.d. instance-label pairs ( x ,y) from these
probability distributions by repeating the following two steps, hence the name generative model: 2
1. Sample y
p(y) . In the alien example, one can think of p(y) as the probability of heads of
a biased coin. Flipping the coin then selects a gender.
2. Sample x
| y) . In the alien example, this samples a two-dimensional feature vector to
describe an alien of the gender chosen in step 1.
p( x
1 Note that a “hat” on a variable (e.g.,
y , θ ) indicates we are referring to an estimated or predicted value.
2 An alternative to generative models are discriminative models, which focus on distinguishing the classes without worrying about
the process underlying the data generation.
Search WWH ::




Custom Search