Information Technology Reference
In-Depth Information
change. Picks one classifier of
M t at random, and re-initialises its matching
function by taking a sample from p ( m k ).
add. Adds one classifier to
M t , with a matching function sampled from p ( m k ),
resulting in K t + 1 classifiers.
remove. Removes one classifier from
M t at random, resulting in K t
1 classifiers.
The actions are chosen by taking samples from the discrete random variable
A
∈{
change , add , remove
}
, where we assume p ( A = add) = p ( A =remove)and
p ( A = change) = 1
2 p ( A = add).
Let us now consider how to compute the acceptance probability (8.4) for each
of these actions. We have p (
K ) p ( K ) by Bayes' Theorem,
where, different to (7.3), we have separated the number of classifiers K from
the model structure
M|D
)
p (
D|M
) p (
M|
. As in (7.4), a uniform prior over unique models is as-
sumed, resulting in p ( K )
M
1 /K !. Additionally, every classifier in
M
is created
K )= p ( m k ) K .
Using variational inference, the model evidence is approximated by the variatio-
nal bound p (
independently by sampling from p ( m k ), which results in p (
M|
D|M
)
exp(
L M ( q )), where
L M ( q ) denotes the variational bound
of model
M
. Thus, in combination we have
L M ( q )) p ( m k ) K ( K !) 1
M |D
p (
)
exp(
)
L M t ( q )) p ( m k ) K t ( K t !) 1 ,
(8.5)
p (
M t |D
exp(
where K denotes the number of classifiers in M .
We get the model transition probability p ( M |M t ) by marginalising over the
actions A ,toget
M |M t )= p (
M |M t ,A = change) p ( A = change)
+ p (
p (
M |M t ,A = add) p ( A = add)
M |M t ,A =remove) p ( A =remove) ,
+ p (
(8.6)
M t |M ). When choosing action add ,then K =
and a similar expression for p (
M |M t ,A =remove)=0,asneit-
her the action change nor the action remove cause a classifier to be added.
M t and
M |M t ,A = change) = p (
K t +1, and p (
M differ in a single classifier that is picked from p ( m k ), and there-
M t |M ,A =add)= p ( m k ). Similarly, when choosing the action remove
fore p (
for
M t , an arbitrary classifier is picked with probability 1 /K t , and therefore
M |M t ,A =remove)=1 /K t . The action change requires choosing a clas-
sifier with probability 1 /K t and reinitialising it with probability p ( m k ), giving
p (
p (
M t |M )canbe
evaluated by observing that the only possible action that causes the reverse tran-
sition from
M |M t ,A = change) = p ( m k ) /K t . The reverse transitions p (
M to
M t after the action add is the action remove ,andviceversa.
Equally, change causes the reverse transition after performing action change .
Overall, the candidate model
M that was created by add from
M t is accepted
by (8.4) with probability
min p (
) , 1
M t |M ,A =remove) p ( A =remove)
p (
M |D
p (
)
M |M t ,A = add) p ( A = add)
p (
M t |D
min (exp (
L M ( q )
−L M t ( q )
2ln( K t +1)) , 1) ,
(8.7)
 
Search WWH ::




Custom Search