Information Technology Reference
In-Depth Information
which defines the probabilistic model of a linear regression and forms the core
of its investigation.
That the assumption of Gaussian noise is sensible is discussed at length by
Maybeck [164, Chap. 1].
5.1.3 Maximum Likelihood and Least Squares
To model the matched observations, a classifier aims at maximising the pro-
bability of these observations given its model, as formally described by (4.24).
Combined with the linear model (5.3), the term to maximise by a single classifier
k is given by
N
x n , w 1 )=
m ( x n )ln p ( y n |
n =1
m ( x n )
y n ) 2 .
N
1
2 ln(2 π )+ 1
τ
2 ( w T x n
2 ln τ
(5.4)
n =1
As already shown in Sect. 4.2.1, maximising (5.4) with respect to the weight
vector w results in the weighted least squares problem,
N
m ( x n ) w T x n
y n 2 ,
mi w
(5.5)
n =1
where the weights are given by the classifier's matching function. Thus, to de-
termine w by maximum likelihood, we only consider observations for which
m ( x n ) > 0, that is, which are matched.
To determine the noise precision of the fitted model, we maximise (5.4) with
respect to τ , resulting in the problem
ln( τ )
y n 2 ,
N
N
m ( x n ) w T x n
max
τ
m ( x n )+ τ
(5.6)
n =1
n =1
where w is the weight vector determined by (5.5).
The rest of this chapter is devoted to discussing batch and incremental lear-
ning solutions to (5.5) and (5.6), starting with batch learning.
5.2
Batch Learning Approaches to Regression
When performing batch learning, all data
is assumed to be available at once
(see Sect. 3.1.5). Hence, we have full knowledge of
D
{
x n ,y n }
, N and, knowing the
current model structure
, also of the classifier's matching function m .
Let us now apply this approach to find the classifier's model parameters by
solving (5.5) and (5.6).
Notation
M
M
The following notation is used in this and the remaining chapters: let x , y
R
M
M
x T y be the inner
be vectors, and A
R
× R
a diagonal matrix. Let
x , y
Search WWH ::




Custom Search