Information Technology Reference
In-Depth Information
the model that describes a set of classifiers as a member of the class of parame-
tric models. This includes an introduction to parametric models in Sect. 3.2.1,
together with a more detailed definition of the localised classifier models and the
global classifier set model in Sect. 3.2.3 and 3.2.4. After discussing how the mo-
del structure influences its training and how the model itself relates to Holland's
initial LCS idea in Sects. 3.2.6 and 3.2.7, a brief overview is given of how the
concepts introduced in this chapter propagate through the chapters to follow.
3.1
Task Definitions
In previous sections the different problem classes that LCS are applied to have
already been described informally. Here, they are formalised to serve as the basis
for further development. We differentiate between regression tasks, classification
tasks, and sequential decision tasks.
Let us assume that we have a finite set of observations generated by noisy
measurements of a stochastic process. All tasks have at their core the formation
of a model that describes a hypothesis for the data-generating process. The pro-
cess maps an input space
X
Y
into an output space
, and so each observation
( x, y ) of that process is formed by an input x
∈X
that occurred and the asso-
ciated measured output y
∈Y
of the process in reaction to the input. The set
of all inputs X =
{
x 1 ,x 2 ,...
}
and associated outputs Y =
{
y 1 ,y 2 ,...
}
is called
the training set or data
.
A model of that process provides a hypothesis for the mapping
D
=
{
X , Y
}
,
induced by the available data. Hence, given a new input x , the model can be used
to predict the corresponding output y that the process is expected to generate.
Additionally, an inspection of the hypothesis structure can reveal regularities
within the data. In sequential decision tasks the model represents the structure
of the task and is employed as the basis of decision-making.
Before going into the the similarities and differences between the regression,
classification and sequential decision tasks, let us firstly consider the diculty of
forming good hypotheses about the nature of the data-generating process from
only a finite number of observations. For this purpose we assume batch lear-
ning , that is, the whole training set with N observations of the form ( x n ,y n )is
available at once. In a later section, this approach is contrasted with incremental
learning , where the model is updated incrementally with each observation.
X→Y
3.1.1 Expected Risk vs. Empirical Risk
In order to model a data-generating process, one needs to be able to express
this process by a smooth stationary function f :
that generates the
observation ( x, y )by y = f ( x )+ ,where is a zero-mean random variable.
Thus, it needs to be given by a function such that the same expected output is
generated for the same input. That is, given two inputs x, x such that x = x ,
the expected output of the process needs to be the same for both inputs. Were
this not the case, then one would be unable to detect any regularities within the
process and so it cannot be modelled in any meaningful way.
X→Y
Search WWH ::




Custom Search