Information Technology Reference
In-Depth Information
constant element (which is usually fixed to 1), which has the equal effect. For
example, consider the input space to be the set of reals; that is
, D X =1
and both x and w are scalars. In such a case, the assumption of a linear model
implies that the observed output follows xw , which is a straight line through the
origin with slope w . To add the bias term, we can instead assume an augmented
input space
X
=
R
, with input vectors x =(1 , x ) T , resulting in the linear
model w T x = w 1 + w 2 x - a straight line with slope w 2 and bias w 1 . Equally, the
input vector can be augmented by other elements to extend the expressiveness
of the linear model, as shown in the following example:
X =
{
R
1
Example 5.1 (Common Classifier Models used in XCS(F)). Initially, classifiers in
XCS [237, 238] only provided a single prediction, independent of the input. Such
behaviour is equivalent to having the scalar input x n =1forall n ,astheweight w
then models the output as an average over all matched outputs, as will be demons-
trated in Example 5.2. Hence, such classifiers will be called averaging classifiers .
Later, Wilson introduced XCSF (the F standing for “function”), that initially
used straight lines as the local models [241]. Hence, in the one-dimensional case,
the inputs are given by x n =(1 ,i n ) to model the output by w 1 + w 2 i n ,where
i n is the variable part of the input. This concept was taken further by Lanzi
et al. [141] by applying 2nd and 3rd order polynomials, using the input vectors
x n =(1 ,i n ,i n ) T and x n =(1 ,i n ,i n ,i n ) T respectively. Naturally, the input vector
does not need to be restricted to taking i n to some power, but allows for the
use of arbitrary functions. These functions are known as basis functions ,asthey
construct the base of the input space. Nonetheless, increasing the complexity of
the input space makes it harder to interpret the local models. Hence, if it is the
aim to understand the localised model, these models should be kept simple -
such as straight lines.
5.1.2 Gaussian Noise
The noise term captures the stochasticity of the data-generating process and
the measurement noise. In the case of linear models, the inputs and outputs
are assumed to stand in a linear relation. Every deviation from this relation
is captured by and is interpreted as noise. Hence, assuming the absence of
measurement noise, the fluctuation of gives information about the adequacy of
assuming a linear model. In other words, if the variance of is small, then inputs
and outputs do indeed follow a linear relation. Hence, the variance of can be
used as a measure of how well the local model fits the data. For that reason, the
aim is not only to find a weight vector that maximises the likelihood, but also
to simultaneously estimate the variance of .
For linear models it is common to assume that the random variable re-
presenting the noise has zero mean, constant variance, and follows a normal
distribution [97], that is
(0 1 ), where τ is the noise precision (inverse
noise variance). Hence, in combination with (5.1), and for some realisation w of
ω and input x , the output is modelled by
∼N
w T x 1 )= τ
2 π
1 / 2
exp
y ) 2 , (5.3)
τ
2 ( w T x
x , w 1 )=
υ
p ( y
|
N
( y
|
 
Search WWH ::




Custom Search