Information Technology Reference
In-Depth Information
The only terms that differ from the ones evaluated in Sect. 7.3.8 are the ones
that contain W and are for the classification model given by
r nk
j
y nj (ψ( α kj ) ψ( α k )) ,
E W,Z (ln p ( Y | X , W , Z )) =
n
(7.124)
k
ln C ( α )+
j
( α j 1)(ψ( α kj ) ψ( α k )) , (7.125)
E W (ln p ( W )) =
k
ln C ( α k )+
j
( α kj 1)(ψ( α kj ) ψ( α k )) (7.126)
E W (ln q ( W )) =
k
Splitting the variational bound again into
L k 's and
L M ,
L k ( q ) for classifier k is
defined as
L k ( q )=
E W,Z (ln p ( Y
|
X , W , Z )) +
E W (ln p ( W ))
E W (ln q ( W )) ,
(7.127)
and evaluates to
ln C ( α k ) ,
L k ( q )=ln C ( α )
(7.128)
where (7.120) was used to simplify the expression.
L M ( q ) remains unchanged
and is thus given by (7.95). As before,
L
( q ) is given by (7.96).
7.5.4
Independent Classifier Training
As before, the classifiers can be trained independently by replacing r nk by
m k ( x n ). This only influences the classifier weight vector update (7.120) that
becomes
α k = α +
n
m k ( x n ) y n .
(7.129)
This change invalidates the simplifications performed to get
L k ( q ) by (7.128).
Instead,
ln C ( α k )
L k ( q )=ln C ( α )
+
j
ψ
( α k )
α kj
( α kj )
r nk y nj + α j
ψ
(7.130)
n
has to be used.
If classifiers are trained independently, then they can be trained in a single
pass by (7.129), as no hyperpriors are used. How the mixing model is trained
and the variational bound is evaluated remains unchanged and is described in
Sect. 7.3.10.
7.5.5
Predictive Density
Given a new observation ( y , x ), its predictive density is given by p ( y |
x ,
).
The density's mixing-model component is essentially the same as in Sect. 7.4.
D
 
Search WWH ::




Custom Search