Information Technology Reference
In-Depth Information
as we have already discussed extensively in Sect. 5.3.5. The weight prior up-
date (7.40) and (7.41), as well as all mixing model update equations remain
unchanged.
Even though all r nk 's in the classifier update equations are replaced with
m k ( x n )'s, the classifier-specific component
L k ( q ) (7.91) remains unchanged. This
L k ( q ) through the expectation
E W,τ,Z (ln p
is because the responsibilities enter
W , τ , Z )), which is based on (7.6) and (7.7). Note that (7.6) combines the
classifier models to form a global model, and is thus conceptually part of the
mixing model rather than the classifier model. Thus, the r nk 's in L k ( q ) specify
how classifier k contributes to the global model and remain unchanged.
Consequently, the variational posteriors for the classifiers only maximise the
variational bound
|
( Y
( q )ifwehave r nk = m k ( x n ) for all n, k . In all other cases,
the variational bound remains below the one that we could achieve by training
the classifiers according to their responsibilities. This effect is analogous to the
reduced likelihood as discussed in Sect. 4.4.5. In cases where we only have one
classifier per observation, we automatically have r nk = m k ( x n ), and thus making
classifier training independent only affects areas where several classifiers match
the same input. Nonetheless, the model structure selection criterion is propor-
tional to the value of the variational bound and therefore most likely prefers
model structures that do not assign multiple classifiers to a single observation.
L
7.3.10
How to Get p (
M|D
)forSome
M
Recall that rather than finding the model parameters θ for a fixed model struc-
ture, the aim is to find the model structure
M
that maximises p (
M|D
). This,
however, cannot be done without also training the model.
Variational Bayesian inference yields a lower bound on ln p (
D|M
)thatisgi-
ven by maximising the variational bound
L
( q ). As p (
M|D
)resultsfrom p (
D|M
)
by (7.3), p (
M|D
) can be approximated for a given model structure
M
by maxi-
mising
( q ) is maximised
with respect to a group of hidden variables while keeping the other ones fixed
by computing (7.24). Therefore, by iteratively updating the distribution para-
meters of q W,τ ( W , τ ), q α ( α ), q V ( V ), q β ( β ), and q Z ( Z ) in a sequential fashion,
the variational bound increases monotonically until it reaches a maximum [26].
Independent classifier training simplifies this procedure by making the update of
q W,τ ( W , τ )and q α ( α ) independent of the update of the other variational densi-
ties. Firstly, the classifier are trained independently of each other and the mixing
model, and secondly, the mixing model parameters are updated accordingly.
To summarise, finding p (
L
( q ). Using the assumptions of factorial distributions,
L
M|D
) for a given model structure can be done with
the following steps:
1. Train the classifiers by iteratively updating the distribution parameters of
q W,τ ( W )and q α ( α ) until convergence, for each classifier separately.
2. Train the mixing model by iteratively updating the distribution parameters
of q V ( V ), q β ( β ), and q Z ( Z ) until convergence.
3. Compute the variational bound
L
( q ) by (7.96).
 
Search WWH ::




Custom Search