Information Technology Reference
In-Depth Information
related previous study [83], to linear LCS models, and to models that treat
classifiers and mixing model as separate components by design.
6.1
Using the Generalised Softmax Function
By relating the probabilistic structure of LCS to the Mixtures-of-Experts model
in Chap. 4, the probability of classifier k generating the n th observation is given
by the generalised softmax function (4.22), that is,
m k ( x n )exp( v k φ ( x n ))
j =1 m j ( x n )exp( v j φ ( x n ))
g k ( x n )=
,
(6.2)
D V ,and φ ( x )
where V =
{
v k }
is the set of mixing model parameters v k
R
is a transfer function that maps the input space
X
into some D V -dimensional
D V .InLCS, this function is usually φ ( x ) = 1 for all x
real space
,with
D V = 1, but to stay general, we do not make any assumptions about the form
of φ .
Assuming knowledge of the predictive densities of all classifiers p ( y
R
∈X
x , θ k ), the
data likelihood (6.1) is maximised by the expectation-maximisation algorithm
by finding the values for V that maximise (4.13), given by
|
N
K
r nk ln g k ( x n ) .
(6.3)
n =1
k =1
In the above equation, r nk stands for the responsibility of classifier k for obser-
vation n , given by (4.12), that is
g k ( x n ) p ( y n | x n , θ k )
j =1 g j ( x n ) p ( y n |
r nk =
.
(6.4)
x n , θ j )
Thus, we want to fit the mixing model to the data by minimising the cross-
entropy
n k r nk ln g k ( x n ) between the responsibilities and the generative
mixing model.
6.1.1
Batch Learning by Iterative Reweighted Least Squares
The softmax function is a generalised linear model, and specialised tools have
been developed to fit such models [165]. Even though a generalisation of this
function is used, the same tools are applicable, as shown in this section. In
particular, the Iterative Reweighted Least Squares (IRLS) will be employed to
find the mixing model parameters.
The IRLS can be derived by applying the Newton-Raphson iterative optimi-
sation scheme [19] that, for minimising an error function E ( V ), takes the form
V (new) =
V (old)
H 1
E ( V ) ,
(6.5)
 
Search WWH ::




Custom Search