Mixing Independently Trained Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

f ( x )), where h is some convex

this deviation by the difference measure h ( f ( x )

−

+ , mixing by a weighted average allows for the derivation of

an upper bound on this difference measure:

function h :

R → R

f :

Theorem 6.1. Given the global estimator

X→ R

, that is formed by a

f ( x )= k g k ( x ) f k ( x ) ,

f k :

weighted averaging of K local estimators

X→ R

0 for all x and k,and k g k ( x )=1 for all x , the difference

between the target function f :

such that g k ( x )

≥

X→ R

and the global estimator is bounded from

above by

h f ( x )

f ( x )

g k ( x ) h f k ( x )

f ( x ) ,

−

≤

−

∀

∈X

(6.21)

+ is a convex function. More specifically, we have

f ( x ) − f ( x ) 2

where h :

R → R

g k ( x ) f k ( x ) − f ( x ) 2

≤

∀ x ∈X,

(6.22)

and

f ( x ) ≤

g k ( x )

f ( x )

f k ( x )

−

∀

∈X

(6.23)

Proof. For any x

∈X

,wehave

f ( x ) = h

f ( x )

h f ( x )

g k ( x ) f k ( x )

−

= h

f ( x )

g k ( x ) f k ( x )

−

g k ( x ) h f k ( x )

f ( x ) ,

≤

−

wherewehaveused k g k ( x ) = 1, and the inequality is Jensen's Inequality (for

example, [231]), based on the convexity of h and the weighted average property of

g k . Having proven (6.21), (6.22) and (6.23) follow from the convexity of h ( a )= a 2

and h ( a )=

, respectively.

Therefore, the error of the global estimator can be minimised by assigning high

weights, that is, high values of g k ( x ), to classifiers whose error of the local estima-

tor is small. Observing in (6.18) that the value of g k ( x ) is directly proportional

to the value of γ k ( x ), a good heuristic will assign high values to γ k ( x ) if the error

of the local estimator can be expected to be small. The design of all heuristics

is based on this intuition.

The probabilistic formulation of the LCS model results in a further bound,

this time on the variance of the output prediction:

Theorem 6.2. Given the density p ( y

x , θ ) for output y given input x and para-

meters θ , formed by the K classifier model densities p ( y

x , θ k ) by p ( y

x , θ k )=

k g k ( x ) p ( y

0 for all x and k,and k g k ( x )=1 for

x , θ k ) , such that g k ( x )

≥

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home