Information Technology Reference
In-Depth Information
parameters w N +1 and τ N +1 . The following notation will be used: X N , y N , M N ,
and c N denote the input, output, matching matrix, and match count respectively,
after N observations. Similarly, X N +1 , y N +1 , M N +1 ,c N +1 stand for the same
objects after knowing the additional observation ( x N +1 ,y N +1 ).
Several methods can be used to perform the model parameter update, starting
with computationally simple gradient-based approaches, to more complex, but
also more stable methods. Since quickly obtaining a good idea of the quality of
the model of a classifier is important, and as the noise precision quality measure
after (5.6) relies on the weight estimate, the speed of convergence with respect to
estimating both w and τ needs to be considered in addition to the computational
costs of the methods.
Firstly, a well-known adaptive filter theory principle concerning the optimality
of incremental linear models will be derived. Then we consider some gradient-
based approaches, followed by approaches that recursively track the least-squares
solution. All this only concerns the weight vector update w . Similar methods will
be applied to the noise precision τ in Sect. 5.3.7.
5.3.1
The Principle of Orthogonality
The Principle of Orthogonality determines when the weight vector estimate w N
is optimal in the weighted least squares sense of (5.5):
Theorem 5.3 (Principle of Orthogonality (for example, [105])). The
weight vector estimate w N after N observations is optimal in the sense of (5.5)
if the sequence of inputs
{
x 1 ,..., x N }
is M N -orthogonal to the sequence of esti-
( w N x 1
y 1 ) ,..., ( w N x N
mation errors
{
y N )
}
, that is
m ( x n ) x n w N x n
y n =0 .
N
X N , X N w N
y N M N =
(5.16)
n =1
Proof. The solution of (5.5) is found by setting the first derivative of (5.7) to
zero to get
2 X N M N y N =0 .
The result follows from rearranging the expression.
2 X N M N X N w N
By multiplying (5.16) by w N , a similar statement can be made about the output
estimates:
Corollary 5.4 (Corollary to the Principle of Orthogonality (for exam-
ple, [105])). The weight vector estimate w N after N observations is optimal in
the sense of (5.5) if the sequence of output estimates
w N x 1 ,..., w N x N }
{
is M N -
( w N x 1
y 1 ) ,..., ( w N x N
orthogonal to the sequence of estimation errors
{
y N )
}
,
that is
m ( x n ) w N x n w N x n
y n =0 .
N
X N w N , X N w N
y N M N =
(5.17)
n =1
Search WWH ::




Custom Search