Information Technology Reference
In-Depth Information
usually not directly accessible. The graph from Fig. 3.1(b) that shows how both
risk measures change with d is reproduced in Fig. 7.1(a) for convenience.
Using a Bayesian model of the data-generating process, one can assess the
probability of the data supporting the polynomial having a particular degree by
Bayesian model selection. The model acquired for this task is the same that is
later introduced for linear regression classifiers and thus will not be discussed
in detail. Variational Bayesian inference, as described Sect. 7.3.1, is used to
evaluate a lower “variational” bound L ( q ) on the model log-probability, that is
L ( q ) ln p ( D|M )+const. = ln p ( M|D ) + const. under the assumption of a
uniform model prior p (
( q )ishighestfor d =2,
which demonstrates that Bayesian model selection correctly identifies the data-
generating model.
M
). As shown in Fig. 7.1(b),
L
7.1.4
Applying Bayesian Model Selection to Finding the Best Set
of Classifiers
Applied to LCS, the model structure is, as previously described, defined by
the number of classifiers K and their matching functions M =
{
m k :
X→
[0 , 1]
. In order to find the best set of classifiers, we need
to maximise its probability density with respect to the data (7.1), which is
equivalent to maximising its logarithm
}
, giving
M
=
{
K, M
}
ln p (
M|D
)=ln p (
D|M
)+ln p (
M
)+const. ,
(7.3)
where the constant term captures the normalising constant and can be ignored
when comparing the different model structures, as it is shared between them.
Evaluating the log-evidence ln p (
D|M
) in (7.3) requires us to firstly specify a
parameter prior p ( θ
.
Unfortunately, the LCS model described in Chap. 4 is not fully Bayesian and
needs to be reformulated before the evidence can be evaluated. Additionally, the
resulting probabilistic model structure does not provide a closed-form solution
to (7.2). Thus, the rest of this chapter is devoted to i) introducing a fully Baye-
sian LCS model, and ii) applying an approximation method called Variational
Bayesian inference that gives us a closed-form expression for the evidence. Be-
fore we do so, let us discuss the prior p (
|M
), and then to evaluate (7.2) to get the evidence of
M
) on the model structure itself, and
why the requirement of specifying parameter and model structure priors is not
an inherit weakness of the method.
M
7.1.5
The Model Structure Prior p (
M
)
Specifying the prior for p (
) lets us express our belief about which model
structures are best at representing the data, prior to knowledge of the data.
Recall that
M
M
=
{
M ,K
}
and thus p (
M
) can be decomposed into p (
M
)=
p ( M
K ) p ( K ). Our belief about the number of classifiers K is that this number
is certainly always finite, which requires p ( K )
|
. The beliefs
about the set of matching functions of M given some K is less clear. Let us only
0with K
→∞
 
Search WWH ::




Custom Search