A Probabilistic Model for LCS - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

such that the gating network fits best the previously calculated responsibilities.

Equation (4.14) causes the experts to be only trained on the areas that they are

assigned to by the responsibilities. The next expectation step re-evaluates the re-

sponsibilities according to the new fit of the experts, and the maximisation step

adapts the gating network and the experts again. Hence, iterating the expecta-

tion and the maximisation step causes the experts to be distributed according

to their best fit to the data.

The pattern of localisation is determined by the form of the gating model. As

previously demonstrated, the softmax function causes a soft linear partition of

the input space. Thus, the underlying assumption of the model is that the data

was generated by some processes that are linearly separated in the input space.

The model structure becomes richer by adding hierarchies to the gating network

[121]. That would move MoE to far away from LCS, which is why it will not

be discussed any further.

4.1.5

Training Issues

The likelihood function of MoE is neither convex nor unimodal [20]. Hence,

training it by using a hill-climbing procedure such as the EM-algorithm will

not guarantee that we find the global maximum. Several approaches have been

developed to deal with this problem (for example, [20, 4]), all of which are either

based on random restart or stochastic global optimisers. Hence, they require

several training epochs and/or a long training time. While this is not an issue

for MoE where the global optimum only needs to be found once, it is not an

option for LCS where the model needs to be (at least partially) re-trained for

each change in the model structure. A potential LCS-related solution will be

presented in Sect. 4.4.

4.2

Expert Models

So far, p ( y

x , θ k ) has been left unspecified. Its form depends on the task that is

to be solved, and differs for regression and classification tasks. Here, we only deal

with the LCS-related univariate regression task and the multiclass classification

tasks, for which the expert models are introduced in the following sections.

|

4.2.1

Experts for Linear Regression

For each expert k , the linear univariate regression model (that is, D Y =1)is

characterised by a linear relation of the input x and the adjustable parameter w k ,

which is a vector of the same size as the input. Hence, the relation between the

input x and the output y is modelled by a hyper-plane w k x

y = 0. Additionally,

the stochasticity and measurement noise are modelled by a Gaussian. Overall,

the probabilistic model for expert k is given by

−

)= τ k

2 π

1 / 2

exp

y ) 2 ,

τ k

2 ( w k x

w k x ,τ − 1

p ( y

|

x , w k ,τ k )=

N

( y

|

−

(4.15)

k

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home