Information Technology Reference
In-Depth Information
corresponds to the RLS algorithm that uses (5.35), and the inverse covariance
form is equivalent to using (5.31). They also share the same characteristics: while
(5.35) is computationally cheaper, it cannot be used with a non-informative
prior, just like the covariance form. Conversely, using (5.31) allows the use of
non-informative priors, but requires a matrix inversion with every additional
update, as does the inverse covariance form to recover w by w = Λ 1 ( Λ w ),
making it computationally more expensive.
The information gain from this relation is manifold:
The weight vector of the linear model corresponds to the system state of the
Kalman filter. Hence, it can be modelled by a multivariate Gaussian, that,
in the notation of the RLS algorithm, is given by ω N
( w N , ( τ Λ N ) 1 ).
As τ is unknown, it needs to be substituted by its estimate τ .
∼N
Acquiring this model for ω causes the output random variable υ to become
Gaussian as well. Hence, using the model for prediction, these predictions will
be Gaussian. More specifically, given a new input x , the predictive density
is
w T x , τ 1 ( x T Λ 1 x + m ( x ) 1 ) ,
y ∼N
(5.60)
and is thus centred on w T x . Its spread is determined on one hand by the
estimated noise variance ( m ( x ) τ ) 1 and the uncertainty of the weight vec-
tor estimate x T ( τ Λ ) 1 x .The Λ in the above equations refers to the one
estimated by the RLS algorithm.
Following Hastie et al. [102, Chap. 8.2.1], the two-sided 95% confidence of
the standard normal distribution is given by considering its 97 . 5% point (as
(100%
2 . 5%) = 95%), which is 1.96. Therefore, the 95% confidence
interval of the classifier predictions is centred on the mean of (5.60) with
1.96 times the square root of the prediction's variance to either side of the
mean.
2
×
In deriving the Kalman filter update equations, matching was embedded as a
modifier to the measurement noise variance, that is n ∼N
(0 , ( m ( x n ) τ ) 1 ),
which gives us a new interpretation for matching: A matching value between
0 and 1 for a certain input can be interpreted as reducing the amount of
information that the model acquires about the associated observation by
increasing the noise of the observation and hence reducing its certainty.
A similar interpretation can be given for RLS with recency-weighting: the
decay factor λ acts as a multiplier to the noise precision of past observations
and hence reduces their certainty. This causes the model to put more empha-
sis on more recent observations due to their lower noise variance. Formally,
modelling the noise for the n th observation after N observations by
0 , m ( x n ) τλ j = n +1 m ( x j ) 1
n ∼N
(5.61)
causes the Kalman filter to perform the same recency weighting as the recency-
weighted RLS variant.
 
Search WWH ::




Custom Search