Geoscience Reference
In-Depth Information
i = 1 , the MLE is usually easy to find. We can
In supervised learning when
D ={
( x i ,y i )
}
rewrite the log likelihood as
l
l
D | θ) = log
log p(
p( x i ,y i | θ) =
log p(y i | θ)p( x i | y i ,θ),
(3.6)
i = 1
i = 1
where we used the fact that the probability of a set of i.i.d. events is the product of individual
probabilities. Finding an MLE is an optimization problem to maximize the log likelihood. In
supervised learning, the optimization problem is often straightforward and yields intuitive MLE
solutions, as the next example shows.
Example 3.2. MLE for Gaussian Mixture Model, All Labeled Data We now present the
derivation for the maximum likelihood estimate for a 2-class Gaussian mixture model when
D ={ ( x i ,y i ) }
i = 1 . We begin by setting up the constrained optimization problem
2
θ
=
argmax
θ
log p(
D |
θ)
s.t.
p(y j |
θ)
=
1 ,
(3.7)
j = 1
where we enforce the constraint that the class priors must sum to 1. We next introduce a Lagrange
multiplier β to form the Lagrangian (see [ 99 ] for a tutorial on Lagrange multipliers)
2
=
D | θ) β(
p(y j | θ) 1 )
(θ, β)
log p(
j = 1
l
2
=
log
p( x i ,y i |
θ)
β(
p(y j |
θ)
1 )
i =
1
j =
1
l
2
=
log p(y i | θ)p( x i | y i ,θ) β(
p(y j | θ) 1 )
i = 1
j = 1
l
l
2
=
log π i +
log
N
( x i ;
μ y i , y i )
β(
π j
1 ),
i =
1
i =
1
j =
1
where π j j , j for j ∈{ 1 , 2 }
are the class priors and Gaussian means and covariance matrices. We
compute the partial derivatives with respect to all the parameters. We then set each partial derivative
to zero to obtain the intuitive closed-form MLE solution:
2
2
∂β =
π j
1
=
0
π j
=
1 .
(3.8)
j
=
1
j
=
1
Clearly, the β Lagrange multiplier's role is to enforce the normalization constraint on the class priors.
∂π j =
1
π j β =
l j
π j β = 0 π j
l j
β =
l j
l ,
=
(3.9)
i : y i = j
 
Search WWH ::




Custom Search