Information Technology Reference
In-Depth Information
Function.
VarMixBound(
G
,
R
,
V
,
Λ
−
V
,
a
β
,
b
β
)
Input
: mixing matrix
G
, responsibilities matrix
R
, mixing weight matrix
V
,
mixing covariance matrix
Λ
−
V
mixing weight prior parameters
a
β
,
b
β
Output
: mixing component
L
M
(
q
) of variational bound
get
D
V
,K
from shape of
V
1
L
M,
1
(
q
)
← K
(
−
ln Γ(
a
β
)+
a
β
ln
b
β
)
2
for
k
=1
to
K
do
3
a
β
k
,b
β
k
←
pick from
a
β
,
b
β
4
L
M,
1
(
q
)
←L
M,
1
(
q
)+lnΓ(
a
β
k
)
− a
β
k
ln
b
β
k
5
L
M,
2
(
q
)
←
Sum(
R
⊗
FixNaN(
ln(
G
R
)
,
0
))
6
|
+
KD
V
2
1
2
ln
|
Λ
−
1
L
M,
3
(
q
)
←
7
V
return
L
M,
1
(
q
)+
L
M,
2
(
q
)+
L
M,
3
(
q
)
8
L
M
(
q
)of
L
(
q
) by evaluating (7.95). As in
VarClBound
, the computation of
L
M
(
q
)
is split into the components
L
M,
1
(
q
),
L
M,
2
(
q
), and
L
M,
3
(
q
), whose sum is retur-
ned.
L
M
(
q
) that depend on the parameters
q
β
(
β
), and is computed in Lines 2 to 5 by iterating over all
k
.
L
M,
1
(
q
) contains the components of
L
M,
2
(
q
)isthe
Kullback-Leibler divergence KL(
R
G
), as given by (7.84), which is computed
in the same way as in Line 17 of Function
TrainMixWeights
.
8.1.5
Scaling Issues
Let us now consider how the presented algorithm scales with the dimensionality
of the input space
D
X
, output space
D
Y
, the mixing feature space
D
V
,the
number
N
of observations that are available, and the number
K
of classifiers. All
O
(
·
) are based on the observation that the multiplication of an
a
×
b
matrix with
a
b
×
c
matrix scales with
O
(
abc
), and the inversion and getting the determinant
(
a
2
), respectively.
Table 8.1 gives an overview of how the different functions scale with
N
,
K
,
D
X
,
D
Y
and
D
V
. Unfortunately, even though
ModelProbability
scales linearly
with
N
and
D
Y
, it neither scales well with
D
X
,norwith
K
and
D
V
.Inallthree
cases, the 3rd polynomial is caused by a matrix inversion.
Considering that
D
3
X
(
a
3
)and
of an
a
×
a
matrix have complexity
O
O
is due to inverting the precision matrix
Λ
k
,itmightbe
reducible to
D
2
X
by using the Sherman-Morrison formula, as shown in Sect. 5.3.5.
D
X
is the dimensionality of the input space with respect to the classifier model,
and is given by
D
X
= 1 for averaging classifiers, and by
D
X
= 2 for classifiers
that model straight lines. Thus, it is in general not too high and
D
3
X
will not
be the most influential complexity component. In any case, as long as we are
required to maintain a covariance matrix
Λ
−
1
k
of size
D
X
×
D
X
, the influence of
D
X
is unlikely to be reducible below
D
2
.
The biggest weakness of the prototype algorithm that was presented here is
that the number of operations required to find the parameters of the mixing
model scale with
K
3
D
V
. This is due to the inversion of the (
KD
V
)
X
(
KD
V
)
Hessian matrix that is required at each iteration of the IRLS algorithm. To apply
×
Search WWH ::
Custom Search