Information Technology Reference
In-Depth Information
Therefore, the model can be written as g = Ξw , and the least squares
cost function becomes
J ( w )= N
y P
g ( ζ k , w ) 2 =
2 =( y p
Ξw ) T ( y p
y p
Ξw
Ξw ) .
k =1
In order to find the vector of parameters for which that function is minimum,
one just has to write that the gradient of the cost function with respect to
the parameters is equal to zero, and to solve the system of equations thus
obtained. Since the cost function is quadratic with respect to the parameters,
the gradient is linear with respect to the parameters. Therefore, the system
of equations (called normal equations) is linear; its solution w LS is the least
squares estimate of the parameters of the model,
Ξ T Ξw LS = Ξ T y p .
If the number of examples N is much larger than the number of inputs q ,
matrix Ξ is generally of rank q (i.e., q rows of Ξ are linearly independent).
If Ξ has rank q , then it can be proved that [ Ξ T Ξ ] also has rank q , hence is
invertible. In that case, the unique least squares solution is readily obtained
as
w LS =( Ξ T Ξ ) 1 Ξ T y p .
By contrast if the number of experiments is too low ( N<q ), matrix Ξ may be
of rank smaller than q , so that the problem has an infinite umber of solutions.
For an input vector ζ , the prediction of the model is given by g ( ζ , w LS )=
ζ T w LS . The vector of the predictions of the model related to the training
examples is g ( ζ , w LS )= Ξw LS , and the vector of residuals (modeling errors
on the training examples) is thus
r = y p
Ξw LS .
Example
The following is a very simple didactic example: a linear model must be
deigned, with a single variable x (hence two inputs: the variable x and a
constant input, equal to 1), from three observations. The three measured val-
ues of variable x are denoted by
{
x 1 ,x 2 ,x 3 }
, and the measured values of the
y 1
p
,y 2
p
,y 3
quantity to be modeled by
{
p }
. Thus, with the above notations, the
input vector is ζ = 1
x
. The output vector is
y 1
p
y 2
p
y 3
p
y p =
.
The vector of parameters is w = w w 2 .
Search WWH ::




Custom Search