Information Technology Reference
In-Depth Information
Function. VarMixBound( G , R , V , Λ V , a β , b β )
Input : mixing matrix G , responsibilities matrix R , mixing weight matrix V ,
mixing covariance matrix Λ V mixing weight prior parameters a β , b β
Output : mixing component L M ( q ) of variational bound
get D V ,K from shape of V
1
L M, 1 ( q ) ← K ( ln Γ( a β )+ a β ln b β )
2
for k =1 to K do
3
a β k ,b β k pick from a β , b β
4
L M, 1 ( q ) ←L M, 1 ( q )+lnΓ( a β k ) − a β k ln b β k
5
L M, 2 ( q ) Sum( R FixNaN( ln( G R ) , 0 ))
6
| + KD V
2
1
2 ln | Λ 1
L M, 3 ( q )
7
V
return
L M, 1 ( q )+ L M, 2 ( q )+ L M, 3 ( q )
8
L M ( q )of
L
( q ) by evaluating (7.95). As in VarClBound , the computation of
L M ( q )
is split into the components
L M, 1 ( q ),
L M, 2 ( q ), and
L M, 3 ( q ), whose sum is retur-
ned.
L M ( q ) that depend on the parameters
q β ( β ), and is computed in Lines 2 to 5 by iterating over all k .
L M, 1 ( q ) contains the components of
L M, 2 ( q )isthe
Kullback-Leibler divergence KL( R
G ), as given by (7.84), which is computed
in the same way as in Line 17 of Function TrainMixWeights .
8.1.5
Scaling Issues
Let us now consider how the presented algorithm scales with the dimensionality
of the input space D X , output space D Y , the mixing feature space D V ,the
number N of observations that are available, and the number K of classifiers. All
O
(
·
) are based on the observation that the multiplication of an a
×
b matrix with
a b
×
c matrix scales with
O
( abc ), and the inversion and getting the determinant
( a 2 ), respectively.
Table 8.1 gives an overview of how the different functions scale with N , K ,
D X , D Y and D V . Unfortunately, even though ModelProbability scales linearly
with N and D Y , it neither scales well with D X ,norwith K and D V .Inallthree
cases, the 3rd polynomial is caused by a matrix inversion.
Considering that D 3
X
( a 3 )and
of an a
×
a matrix have complexity
O
O
is due to inverting the precision matrix Λ k ,itmightbe
reducible to D 2
X
by using the Sherman-Morrison formula, as shown in Sect. 5.3.5.
D X is the dimensionality of the input space with respect to the classifier model,
and is given by D X = 1 for averaging classifiers, and by D X = 2 for classifiers
that model straight lines. Thus, it is in general not too high and D 3
X
will not
be the most influential complexity component. In any case, as long as we are
required to maintain a covariance matrix Λ 1
k
of size D X ×
D X , the influence of
D X is unlikely to be reducible below D 2
.
The biggest weakness of the prototype algorithm that was presented here is
that the number of operations required to find the parameters of the mixing
model scale with K 3 D V . This is due to the inversion of the ( KD V )
X
( KD V )
Hessian matrix that is required at each iteration of the IRLS algorithm. To apply
×
 
Search WWH ::




Custom Search