Information Technology Reference
In-Depth Information
change. Picks one classifier of
M
t
at random, and re-initialises its matching
function by taking a sample from
p
(
m
k
).
add. Adds one classifier to
M
t
, with a matching function sampled from
p
(
m
k
),
resulting in
K
t
+ 1 classifiers.
remove. Removes one classifier from
M
t
at random, resulting in
K
t
−
1 classifiers.
The actions are chosen by taking samples from the discrete random variable
A
∈{
change
,
add
,
remove
}
, where we assume
p
(
A
= add) =
p
(
A
=remove)and
p
(
A
= change) = 1
2
p
(
A
= add).
Let us now consider how to compute the acceptance probability (8.4) for each
of these actions. We have
p
(
−
K
)
p
(
K
) by Bayes' Theorem,
where, different to (7.3), we have separated the number of classifiers
K
from
the model structure
M|D
)
∝
p
(
D|M
)
p
(
M|
. As in (7.4), a uniform prior over unique models is as-
sumed, resulting in
p
(
K
)
M
∝
1
/K
!. Additionally, every classifier in
M
is created
K
)=
p
(
m
k
)
K
.
Using variational inference, the model evidence is approximated by the variatio-
nal bound
p
(
independently by sampling from
p
(
m
k
), which results in
p
(
M|
D|M
)
∝
exp(
L
M
(
q
)), where
L
M
(
q
) denotes the variational bound
of model
M
. Thus, in combination we have
L
M
(
q
))
p
(
m
k
)
K
(
K
!)
−
1
M
|D
p
(
)
exp(
)
≈
L
M
t
(
q
))
p
(
m
k
)
K
t
(
K
t
!)
−
1
,
(8.5)
p
(
M
t
|D
exp(
where
K
denotes the number of classifiers in
M
.
We get the model transition probability
p
(
M
|M
t
) by marginalising over the
actions
A
,toget
M
|M
t
)=
p
(
M
|M
t
,A
= change)
p
(
A
= change)
+
p
(
p
(
M
|M
t
,A
= add)
p
(
A
= add)
M
|M
t
,A
=remove)
p
(
A
=remove)
,
+
p
(
(8.6)
M
t
|M
). When choosing action
add
,then
K
=
and a similar expression for
p
(
M
|M
t
,A
=remove)=0,asneit-
her the action
change
nor the action
remove
cause a classifier to be added.
M
t
and
M
|M
t
,A
= change) =
p
(
K
t
+1, and
p
(
M
differ in a single classifier that is picked from
p
(
m
k
), and there-
M
t
|M
,A
=add)=
p
(
m
k
). Similarly, when choosing the action
remove
fore
p
(
for
M
t
, an arbitrary classifier is picked with probability 1
/K
t
, and therefore
M
|M
t
,A
=remove)=1
/K
t
. The action
change
requires choosing a clas-
sifier with probability 1
/K
t
and reinitialising it with probability
p
(
m
k
), giving
p
(
p
(
M
t
|M
)canbe
evaluated by observing that the only possible action that causes the reverse tran-
sition from
M
|M
t
,A
= change) =
p
(
m
k
)
/K
t
. The reverse transitions
p
(
M
to
M
t
after the action
add
is the action
remove
,andviceversa.
Equally,
change
causes the reverse transition after performing action
change
.
Overall, the candidate model
M
that was created by
add
from
M
t
is accepted
by (8.4) with probability
min
p
(
)
,
1
M
t
|M
,A
=remove)
p
(
A
=remove)
p
(
M
|D
p
(
)
M
|M
t
,A
= add)
p
(
A
= add)
p
(
M
t
|D
≈
min (exp (
L
M
(
q
)
−L
M
t
(
q
)
−
2ln(
K
t
+1))
,
1)
,
(8.7)
Search WWH ::
Custom Search