Information Technology Reference
In-Depth Information
Using the normalised least mean squared (NLMS) algorithm as described in
Sect. 5.3.4, the weight vector estimate update for classifier k is given by
2 Q t +1 ( x t ,a t )
w k x t ,
x t
w k,t +1 = w k,t + αm k ( x t ,a t )
(9.28)
x t
where α denotes the step size, and Q t +1 ( x t ,a t ) is given by (9.26). As discussed
in more detail in Sect. 9.3.6, this is the weight vector update of XCSF.
The noise variance of the model can be estimate by the LMS algorithm, as
described in Sect. 5.3.7. This results in the update equation
+ αm k ( x t ,a t ) w k,t +1 x t
,
Q t +1 ( x t ,a t ) 2
τ 1
k,t +1 = τ 1
τ 1
k,t
(9.29)
k,t
where α is again the scalar step size, and Q t +1 ( x t ,a t ) is given by (9.26).
9.3.5
Q-Learning by Recursive Least Squares
As shown in Chap. 5, incremental methods based on gradient descent might
suffer from slow convergence rates. Thus, despite their higher computational and
space complexity, methods based on directly tracking the least squares solution
are to be preferred. Consequently, rather than using NLSM, this section shown
how to apply recursive least squares (RLS) and direct noise precision tracking
to Q-Learning with LCS.
The non-stationarity of the action-value function estimate needs to be take
into account by using a recency-weighed RLS variant that puts more weight on
recent observation. This was not an issue for the NLMS algorithm, as it performs
recency-weighting implicitly.
Minimising the recency-weighted variant of the sum of squared errors (9.27),
the update equations are according to Sect. 5.3.5 given by
k,t +1 x t Q t +1 ( x t ,a t )
w k,t x t (9.30)
w k,t +1 = λ m k ( x t ,a t ) w k,t + m k ( x t ,a t ) Λ 1
Λ 1
k,t +1 = λ −m k ( x t ,a t ) Λ 1
k,t ,
(9.31)
Λ 1
k,t x t x t Λ 1
k,t
λ m k ( x t ,a t ) + m k ( x t ,a t ) x t Λ 1
m k ( x t ,a t ) λ −m k ( x t ,a t )
,
k,t x t
where Q t +1 ( x t ,a t ) is given by (9.26), and w k, 0 and Λ 1
k, 0 are initialised by w k, 0 =
0 and Λ k, 0 = δ I ,where δ is a large scalar. λ determines the recency weighting,
which is strongest for λ = 0, where only the last observation is considered, and
deactivated when λ =1.
Using the RLS algorithm to track the least squares approximation of the
action-values for each classifier allows us to directly track the classifier's model
noise variance, as described in Sect. 5.3.7. More precisely, we track the sum of
 
Search WWH ::




Custom Search