Information Technology Reference
In-Depth Information
transition matrix that describes the Markov chain probabilities of transiting
from one system state to another. Thus, changes in the environment and the
LCS are tracked simultaneously.
Environmental similarities are exploited in the model by partitioning the Mar-
kov matrix into equivalence classes togetasub-Markovmatrix that collapses
similar states into one. From this, reset times, upper bounds on expected expe-
riment repetition times and other properties can be derived.
The model was created before the emergence of modern RL 4 and so cannot
refer to its theoretical advances, and was not updated to reflect those. Additio-
nally, the inclusion of the LCS state into the model causes the number of states
to be uncountable due to the real-valued parametrisation of LCS. Thus, it is
unclear if the model will provide significant advances in the understanding of
LCS. Rather, one should rely on RL theory to study the performance of LCS
in sequential decision tasks, as discussed in Chap. 9.
2.4.2
Approaches from the Genetic Algorithm Side
As many researchers consider LCS as Genetic-based Machine Learners (GBML),
they are most frequently analysed from the GA perspective. Particularly when
considering single-step problems, when each action is immediately mediated by
a reward, the task is a regression task and does not require an RL component.
Due to its similarity to the LCS model that will be introduced, we will mainly
consider the analyses performed on XCS. Note, however, that none of these
analyses is of direct importance to the work presented here, as they study a
single algorithm that performs a task which is here only define by its aim, rather
than by how it is performed. Nonetheless, the analysis of XCS has given valuable
insights into the set of classifiers that XCS aims at evolving - a topic that is
reconsidered in Sect. 7.1.1.
Single-Step Tasks
Single-step problems are essentially regression tasks where XCS aims at learning
a complete mapping from the input space to the output space. In XCS, such
problems are handled by an RL method that for these tasks reduces to a gradient-
based supervised learning approach, as will be shown in Sects. 5.3.3 and 5.3.4.
Most of the analysis of XCS in single-step tasks has been performed by Butz
et al. in an ongoing effort [51, 53, 44, 56, 49, 46, 50, 58] restricted to binary
string representations, and using a what they call facet-wise approach .Their
approach is to look at single genetic operators, analyse their functionality and
then assemble a bigger picture from the operators' interaction, sometimes taking
simplifying assumptions to make the analysis tractable.
They analyse the various evolutionary pressures in XCS, showing that the set
pressure pushes towards less specific classifiers [53], as already conjectured in
4 “Emergence of modern RL” refers to Sutton's development of TD [207] and Watkin's
Q-Learning [228].
 
Search WWH ::




Custom Search