Information Technology Reference
In-Depth Information
parameters
w
N
+1
and
τ
N
+1
. The following notation will be used:
X
N
,
y
N
,
M
N
,
and
c
N
denote the input, output, matching matrix, and match count respectively,
after
N
observations. Similarly,
X
N
+1
,
y
N
+1
,
M
N
+1
,c
N
+1
stand for the same
objects after knowing the additional observation (
x
N
+1
,y
N
+1
).
Several methods can be used to perform the model parameter update, starting
with computationally simple gradient-based approaches, to more complex, but
also more stable methods. Since quickly obtaining a good idea of the quality of
the model of a classifier is important, and as the noise precision quality measure
after (5.6) relies on the weight estimate, the speed of convergence with respect to
estimating both
w
and
τ
needs to be considered in addition to the computational
costs of the methods.
Firstly, a well-known adaptive filter theory principle concerning the optimality
of incremental linear models will be derived. Then we consider some gradient-
based approaches, followed by approaches that recursively track the least-squares
solution. All this only concerns the weight vector update
w
. Similar methods will
be applied to the noise precision
τ
in Sect. 5.3.7.
5.3.1
The Principle of Orthogonality
The Principle of Orthogonality determines when the weight vector estimate
w
N
is optimal in the weighted least squares sense of (5.5):
Theorem 5.3 (Principle of Orthogonality (for example, [105])).
The
weight vector estimate
w
N
after N observations is optimal in the sense of (5.5)
if the sequence of inputs
{
x
1
,...,
x
N
}
is
M
N
-orthogonal to the sequence of esti-
(
w
N
x
1
−
y
1
)
,...,
(
w
N
x
N
−
mation errors
{
y
N
)
}
, that is
m
(
x
n
)
x
n
w
N
x
n
−
y
n
=0
.
N
X
N
,
X
N
w
N
−
y
N
M
N
=
(5.16)
n
=1
Proof.
The solution of (5.5) is found by setting the first derivative of (5.7) to
zero to get
2
X
N
M
N
y
N
=0
.
The result follows from rearranging the expression.
2
X
N
M
N
X
N
w
N
−
By multiplying (5.16) by
w
N
, a similar statement can be made about the output
estimates:
Corollary 5.4 (Corollary to the Principle of Orthogonality (for exam-
ple, [105])).
The weight vector estimate
w
N
after N observations is optimal in
the sense of (5.5) if the sequence of output estimates
w
N
x
1
,...,
w
N
x
N
}
{
is
M
N
-
(
w
N
x
1
−
y
1
)
,...,
(
w
N
x
N
−
orthogonal to the sequence of estimation errors
{
y
N
)
}
,
that is
m
(
x
n
)
w
N
x
n
w
N
x
n
−
y
n
=0
.
N
X
N
w
N
,
X
N
w
N
−
y
N
M
N
=
(5.17)
n
=1
Search WWH ::
Custom Search