Information Technology Reference
In-Depth Information
g1(x)
g2(x)
1
1
0.8
0.8
0.6
0.6
0.4
-4
0.4
-4
-2
-2
0.2
0.2
0
0
x1
x1
0
0
-4
2
-4
2
-2
-2
0
0
4
4
2
2
4
4
x2
x2
(a)
(b)
Fig. 4.4. Plots showing the generalised softmax function (4.22) for 2 classifiers with
inputs x =(1 ,x 1 ,x 2 ) T and φ ( x )= x , where Classifier 1 in plot (a) has gating parame-
ters v 1 =(0 , 0 , 1) T and matches a circle of radius 3 around the origin, and Classifier 2
in plot (b) has gating parameters v 2 =(0 , 1 , 0) T
and matches all inputs
and m 2 ( x ) = 1 for all x . Therefore, Classifier 1 matches a circle of radius 3
around the origin, and Classifier 2 matches the whole input space. The values
for g 1 ( x )and g 2 ( x ) are shown in Figs. 4.4(a) and 4.4(b), respectively. As can
be seen, the whole part of the input space that is not matched by Classifier 1
is fully assigned to Classifier 2 by g 2 ( x ) = 1. In the circular area where both
classifiers match, the softmax function performs a soft linear partitioning of the
input space, just as in Fig. 4.2.
The effect of changing the transfer function to φ ( x ) = 1 is visualised in
Fig. 4.5, and shows that in such a case no linear partitioning takes place. Rat-
her, in areas of the input space that both classifiers match, (4.22) assigns the
generation probabilities input-independently in proportion the exponential of
the gating parameters v 1 =0 . 7and v 2 =0 . 3.
Besides localisation beyond matching, the generalised MoE model has another
feature that distinguishes it from any previous LCS 3 : it allows for matching
by a degree of the range [0 , 1] rather than by just specifying where a classi-
fier matches and where it does not (as, for example, specified by set
X k and
(3.9)). Additionally, by (4.19), this degree has the well-defined meaning of the
probability p ( m k =1
|
x ) of classifier k matching input x . Alternatively, by ob-
serving that
E
( m k |
x )= p ( m k =1
|
x ), this degree can also be interpreted as the
3 While Butz seems to have experimented with matching by a degree in [41], he does
not describe how it is implemented and only states that “Preliminary experiments in
that respect [ ... ] did not yield any further improvement in performance”. Further-
more, his hyper-ellipsoidal conditions [41, 52] might look like matching by degree
on initial inspection, but as he determines matching by a threshold on the basis
function, matching is still binary. Fuzzy LCS (for example, [60]), on the other hand,
provide matching by degree but are usually not developed from the bottom up which
makes modifying the parameter update equations dicult.
 
Search WWH ::




Custom Search