Information Technology Reference
In-Depth Information
Since each classifier is trained independently (see Sect. 4.4), this chapter fo-
cuses exclusively on the training of a single classifier k . To keep the notation
uncluttered, the subscript k is dropped; that is, the classifier's matching func-
tion m k is denoted m , the model parameters θ k =
become w and τ ,
and the estimate f k provided by classifier k is denoted f . For any further varia-
bles introduced throughout this chapter it will be explicitly stated whether they
are local to a classifier.
Firstly, the linear regression classifier model and its underlying assumptions
are introduced, followed in Sect 5.2 by how to estimate its parameters if all trai-
ning data is available at once. Incremental learning approaches are discussed in
Sect. 5.3, where gradient-based and exact methods of tracking the optimal weight
vector estimate are described. Estimating the noise variance simultaneously is dis-
cussed for both methods in Sect. 5.3.7. In Sect. 5.4, slow convergence of gradient-
based methods is demonstrated empirically. Turning to classification, the training
of these models is discussed in Sect. 5.5, after which the chapter is summarised by
putting its content into the context of current LCS.
{
w k k }
5.1
Linear Classifier Models and Their Underlying
Assumptions
Linear regression models were chosen as a good balance between the expres-
siveness of the model and the ease of training the model (see Sect. 3.2.3). The
univariate linear model has already been introduced Sect. 4.2.1, but here, its
underlying assumptions and implications are considered in more detail.
5.1.1
Linear Models
A linear model assumes a linear relation between the inputs and the output,
parametrised by a set of model parameters. Given an input vector x with
D X elements, the model is parametrised by the equally-sized random vector ω
with realisation w , and assumes that the scalar output random variable υ with
realisation y follows the relation
υ = ω T x + ,
(5.1)
where is a zero-mean Gaussian random variable that models the stochasticity of
the process and the measurement noise. Hence, ignoring for now the noise term
, its is assumed that the process generates the output by a weighted sum of the
components of the input, as becomes very clear when considering a realisation
w of ω , and rewriting the inner product
w T x
w i x i ,
(5.2)
i
where w i and x i are the i th element of w and x respectively.
While linear models are usually augmented by a bias term to offset them
from the origin, it will be assumed that the input vector always contains a single
Search WWH ::




Custom Search