Towards Reinforcement Learning with LCS - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Would the classifier set optimality criterion that was introduced in Chap. 7

also provide us with a safeguard against divergence at the model structure level;

that is, would divergent classifiers be detected? In contrast to XCS(F), the cri-

terion that was presented does not assume a classifier to be a bad local model

as soon as its model error is above a certain threshold. Rather, the localisation

of a classifier is inappropriate if its model is unable to capture the apparent

pattern that is hidden in the noisy data. Therefore, it is not immediately clear

if the criterion would detect the divergent model as a pattern that the classifier

cannot model, or if it would assume it to be noise.

In any case, providing stability on the model structure level is to repair the

problem of divergence after it occurred, and relies on the assumption that chan-

ging the model structure does indeed provide us with the required stability. This

is not a satisfactory solution, which is why the focus should be on preventing

the problem from occurring at all, as discussed in the next section.

Stability on the Parameter Learning Level

Given a fixed model structure

, the aim is to provide parameter learning that

is guaranteed to converge when used with DP methods. Recall that both value

iteration and policy iteration are guaranteed to converge if the approximation

architecture is a non-expansion with respect to the maximum norm

M

· ∞ .It

being a non-expansion with respect to the weighted norm

· D , on the other

hand, is sucient for the convergence of the policy evaluation step of policy

iteration, but not value iteration. In order to guarantee stability of either method

when using LCS,theLCS approximation architecture needs to provide such a

non-expansion.

Observe that having a single classifier that matches all states is a valid mo-

del structure. In order for this model structure to provide a non-expansion, the

classifier model itself must form a non-expansion. Therefore, to ensure that the

LCS model provides the non-expansion property for any model structure, every

classifier model needs to form a non-expansion, and any mixture of a set of loca-

lised classifiers that forms the global LCS model needs to form a non-expansion

as well. Formally, if

·

denotes the norm in question, we need

Π V ≤

V

Π V

−

V

−

(9.33)

to hold for any two V , V , where Π is the approximation operator of a given

LCS model structure. If the model structure is formed by a single classifier that

matches all states,

Π k V ≤

V

Π k V

−

V

−

(9.34)

needs to hold for any two V , V ,whereΠ k is the approximation operator of a

single classifier. These requirements are independent of the LCS model type.

Returning to the LCS model structure with independently trained classifiers,

the next two sections concentrate on its non-expansion property, firstly with

respect to

· ∞ , and then with respect to

· D .

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home