Information Technology Reference
In-Depth Information
g1(x)
g2(x)
1
1
0.8
0.8
0.6
0.6
0.4
-4
0.4
-4
-2
-2
0.2
0.2
0
0
x1
x1
0
0
-4
2
-4
2
-2
-2
0
0
4
4
2
2
4
4
x2
x2
(a)
(b)
Fig. 4.4.
Plots showing the generalised softmax function (4.22) for 2 classifiers with
inputs
x
=(1
,x
1
,x
2
)
T
and
φ
(
x
)=
x
, where Classifier 1 in plot (a) has gating parame-
ters
v
1
=(0
,
0
,
1)
T
and matches a circle of radius 3 around the origin, and Classifier 2
in plot (b) has gating parameters
v
2
=(0
,
1
,
0)
T
and matches all inputs
and
m
2
(
x
) = 1 for all
x
. Therefore, Classifier 1 matches a circle of radius 3
around the origin, and Classifier 2 matches the whole input space. The values
for
g
1
(
x
)and
g
2
(
x
) are shown in Figs. 4.4(a) and 4.4(b), respectively. As can
be seen, the whole part of the input space that is not matched by Classifier 1
is fully assigned to Classifier 2 by
g
2
(
x
) = 1. In the circular area where both
classifiers match, the softmax function performs a soft linear partitioning of the
input space, just as in Fig. 4.2.
The effect of changing the transfer function to
φ
(
x
) = 1 is visualised in
Fig. 4.5, and shows that in such a case no linear partitioning takes place. Rat-
her, in areas of the input space that both classifiers match, (4.22) assigns the
generation probabilities input-independently in proportion the exponential of
the gating parameters
v
1
=0
.
7and
v
2
=0
.
3.
Besides localisation beyond matching, the generalised MoE model has another
feature that distinguishes it from any previous LCS
3
: it allows for matching
by a degree of the range [0
,
1] rather than by just specifying where a classi-
fier matches and where it does not (as, for example, specified by set
X
k
and
(3.9)). Additionally, by (4.19), this degree has the well-defined meaning of the
probability
p
(
m
k
=1
|
x
) of classifier
k
matching input
x
. Alternatively, by ob-
serving that
E
(
m
k
|
x
)=
p
(
m
k
=1
|
x
), this degree can also be interpreted as the
3
While Butz seems to have experimented with matching by a degree in [41], he does
not describe how it is implemented and only states that “Preliminary experiments in
that respect [
...
] did not yield any further improvement in performance”. Further-
more, his hyper-ellipsoidal conditions [41, 52] might look like matching by degree
on initial inspection, but as he determines matching by a threshold on the basis
function, matching is still binary. Fuzzy LCS (for example, [60]), on the other hand,
provide matching by degree but are usually not developed from the bottom up which
makes modifying the parameter update equations dicult.
Search WWH ::
Custom Search