Information Technology Reference
In-Depth Information
The Gaussian prior on ω provides a different interpretation of the ridge
complexity λ in ridge regression: recalling that λ corresponds to initialising
RLS with Λ 1
0
= λ 1 I , it is also equivalent to using the Kalman filter with
( 0 , ( λτ ) 1 I ). Hence, ridge regression assumes the weight
vector to be centred on 0 with an independent variance of ( λτ ) 1 of each
element of this vector. As the prior covariance is proportional to the real
noise variance τ 1 , a smaller variance will cause stronger shrinkage due to a
more informative prior.
the prior ω 0
∼N
What if the noise distribution is not Gaussian? Would that invalidate the
approach taken by RLS and the Kalman filter? Fortunately, the Gauss-Markov
Theorem (for example, [97]) states that the least squares estimate is optimal
independent of the shape of the noise distribution, as long as its variance is
constant over all observations. Nonetheless, adding the assumption of Gaussian
noise and acquiring a Gaussian model for the weight vector allows us to specify
the predictive density. Without these assumptions, we would be unable make
any statements about this density, and are subsequently also unable to provide
a measure for the prediction confidence.
In summary, demonstrating the formal equivalence between the RLS algo-
rithm and the Kalman filter for a stationary system state has significantly in-
creased the understanding of the assumptions underlying the RLS method and
provides intuitive interpretations for matching and recency-weighting by relating
them to an increased uncertainty about the observations.
5.3.7
Incremental Noise Precision Estimation
So far, the discussion of the incremental methods has focused on estimating the
weight vector that solves (5.5). Let us now consider how we can estimate the
noise precision by incrementally solving (5.6).
For batch learning it was already demonstrated that (5.11) and (5.13) pro-
vide a biased and unbiased noise precision estimate that solves (5.6). The same
solutions are valid when using an incremental approach, and thus, after N
observations,
τ 1
N
= c N
2
M N
X N w N
y N
(5.62)
provides a biased estimate of the noise precision, and
τ 1
N
D X ) 1
2 M N
=( c N
X N w N
y N
(5.63)
is the unbiased estimate. Ideally, w N is the weight vector that satisfies the Prin-
ciple of Orthogonality, but if gradient-based methods are utilised, we are forced
to rely on the current (possibly quite wrong) estimate.
Let us firstly derive a gradient-based method for estimating the noise preci-
sion, which is the one applied in XCS. Following that, a much more accurate
approach is introduced that can be used alongside the RLS algorithm to track
the exact noise precision estimate after (5.63) for each additional observation.
 
Search WWH ::




Custom Search