Information Technology Reference
In-Depth Information
Function
O
(
·
)
Comments
ModelProbability NK 3 D 3
D Y D V K 3 D V from TrainMixing ,
D 3
X
X
from TrainClassifier
ND 3
X
D 3
X
due to Λ 1
k
D Y
TrainClassifier
NK 3 D 2
X
D Y D V K 3 D 2
D V from
TrainMixWeights
TrainMixing
X
NKD V
Mixing
NKD 2
X
D 2
X
due to 1
k
D Y
Responsibilities
TrainMixWeights NK 3 D 2
D Y D V ( KD V ) 3 due to H 1 ,
D 2
X
X
from Responsibilities
NK 2 D V
K 2 due to nested iteration,
D V due to Φ T ( Φ ( g k g j ))
Hessian
TrainMixPriors
KD V
ND 2
X
D 2
X
due to 1
k
or | Λ 1
k
D Y
|
VarClBound
( KD V ) 2 due to | Λ 1
V
NK 2 D V
VarMixBound
|
Fig. 8.1. Complexity of the different functions with respect to the number of obser-
vations N , the number of classifiers K , the dimensionality of the input space D X ,the
dimensionality of the output space D Y , and the dimensionality of the mixing feature
space D V
variational inference to real-world problems, the algorithm would be required to
scale linearly with the number of classifiers K . This is best achieved by approxi-
mating the optimal mixing weights by well-tuned heuristics, as was already done
for the prior-free LCS model in Chap. 6. What remains to do is to find similar
heuristics that honour the prior. The mixing feature space dimensionality, on
the other hand, is usually D V = 1, and its influence is therefore negligible.
In summary, the presented algorithm scales with
( NK 3 D 3
X
D Y D V ). While
O
it might be possible to reduce D 3
X
to D 2
X
, it still scales super-linearly with the
number of classifiers K . This is due to the use of the generalised softmax func-
tion that requires the application of the IRLS algorithm to find its parameters.
To reduce the complexity, the softmax function needs to either be replaced by
another model that is easier to train, or well-tuned heuristics that provide a
good approximation to it.
8.2
Two Alternatives for Model Structure Search
Recall that the optimal set of classifiers
M
was defined as the set that maximises
p (
M|D
). Therefore, in order to find this optimal set we need to search the space
¯
¯
{M}
for the
M
such that p (
M|D
)
p (
M|D
) for all
M
. This can theoretically
 
Search WWH ::




Custom Search