Geoscience Reference
In-Depth Information
where l j is the number of instances in class j . Note that we find β
l by substituting (3.9) into
(3.8) and rearranging. In the end, we see that the MLE for the class prior is simply the fraction of
instances in each class. We next solve for the class means. For this we need to use the derivatives
with respect to the vector μ j . In general, let v be a vector and A a square matrix of the appropriate
size, we have
=
∂v v Av
=
2 Av . This leads to
∂μ j
∂μ j
1
2 ( x i
μ j ) 1
=
i : y i = j
( x i
μ j )
j
l j
i : y i = j
1
1
j
=
( x i
μ j )
=
0
μ j
=
x i .
(3.10)
i : y i = j
We see that the MLE for each class mean is simply the class's sample mean. Finally, the MLE
solution for the covariance matrices is
l j
i : y i = j
1
( x i μ j )( x i μ j ) ,
j
=
(3.11)
which is the sample covariance for the instances of that class.
3.2 MIXTURE MODELS FOR SEMI-SUPERVISED CLASSIFI-
CATION
In semi-supervised learning,
consists of both labeled and unlabeled data. The likelihood depends
on both the labeled and unlabeled data—this is how unlabeled data might help semi-supervised
learning in mixture models. It is no longer possible to solve the MLE analytically. However, as we
will see in the next section, one can find a local maximum of the parameter estimate using an iterative
procedure known as the EM algorithm.
Since
D
the
training
data
consists
of
both
labeled
and
unlabeled
data,
i.e.,
D =
{ ( x 1 ,y 1 ),...,( x l ,y l ), x l + 1 ,..., x l + u }
, the log likelihood function is now defined as
l
l + u
D |
log p(
θ)
=
log
p( x i ,y i |
θ)
p( x i |
θ)
(3.12)
i
=
1
i
=
l
+
1
l
l + u
=
log p(y i | θ)p( x i | y i ,θ) +
log p( x i | θ).
(3.13)
i = 1
i = l + 1
The essential difference between this semi-supervised log likelihood (3.13) and the previous super-
vised log likelihood (3.6) is the second term for unlabeled instances. We call p( x
| θ) the marginal
probability , which is defined as
C
C
| θ) =
p( x ,y | θ) =
p(y | θ)p( x
| y,θ).
p( x
(3.14)
y = 1
y = 1
 
Search WWH ::




Custom Search