Information Technology Reference
In-Depth Information
where
N
c k =
m k ( x n )=Tr( M k ) ,
(5.12)
n =1
is the match count of classifier k , and is in this chapter simply denoted c .Tr( M )
denotes the trace of the matrix M , which is the sum of its diagonal elements.
Hence, the inverse noise precision, that is, the noise variance, is given by the ave-
rage squared error of the model output estimates over all matched observations.
Note, however, that the precision estimate is biased, as it is based on another
estimate w [97, Chap. 5]. This can be accounted for by instead using
τ 1 =( c
D X ) 1
2 M ,
X w
y
(5.13)
which is the unbiased estimate of the noise precision.
To summarise, the maximum likelihood model parameters of a classifier using
batch learning are found by first evaluating (5.8) to get w and then (5.13) to
get τ .
Example 5.2 (Batch Learning with Averaging Classifiers). Averaging classifiers
are characterised by using x n =1forall n for their linear model. Hence, we have
X =(1 ,..., 1) T , and evaluating (5.8) results in the scalar weight estimate
N
w = c 1
m ( x n ) y n ,
(5.14)
n =1
which is the outputs y n averaged over all matched inputs. Note that, as discussed
in Sect. 3.2.3, the inputs to the matching function as appearing in m ( x n )are
not necessarily the same as the ones used to build the local model. In the case
of averaging classifiers this differentiation is essential, as the inputs x n =1used
for building the local models do not carry any information that can be used for
localisation of the classifiers.
The noise precision is determined by evaluating (5.13) and results in
N
τ 1 =( c
1) 1
y n ) 2 ,
m ( x n )( w
(5.15)
n =1
which is the unbiased average over the squared deviation of the outputs from
their average, and hence gives an indication of which prediction error can be
expected from the linear model.
5.3
Incremental Learning Approaches to Regression
Having derived the batch learning solution, let us now consider the case where
we want to update our model with each additional observation. In particular,
assume that the model parameters w N and τ N are based on N observations, and
the new observations ( x N +1 ,y N +1 ) are to be incorporated, to get the updated
 
Search WWH ::




Custom Search