Information Technology Reference
In-Depth Information
Solving for λ and using j
w j =1and j y nj =1forall N ,weget λ =
n m ( x n )= c , which is the match count after N observations. As a result, w
is after N observations by the principle of maximum likelihood given by
N
w = c 1
m ( x n ) y n ,
(5.82)
n =1
Thus, the j th element of w , representing the probability of the classifier having
generated an observation of class j , is the number of matched observations of this
class divided by the total number of observations - a straightforward frequentist
measure.
5.5.3
Incremental Learning for Classification
Let w N be the estimate of w after N observations. Given the new observation
( x N +1 , y N +1 ), the aim of the incremental approach is to find a computatio-
nally ecient approach to update w N to reflect this new knowledge. By (5.82),
c N +1 w N +1 is given by
N +1
c N +1 w N +1 =
m ( x n ) y n
n =1
N
=
m ( x n ) y n + m ( x N +1 ) y N +1
n =1
=( c N +1
m ( x N +1 )) w N + m ( x N +1 ) y N +1 .
(5.83)
Dividing the above by c N +1 results in the final incremental update
c N +1 m ( x N +1 )( w N
w N +1 = w N
y N +1 ) .
(5.84)
This update tracks (5.82) accurately, is of complexity
( D Y ), and only requires
the parameter vector w and the match count c to be stored. Thus, it is accurate
and ecient.
O
Example 5.9 (Classifier Model for Classification). Figure 5.3 shows the data of
a classification task with two distinct classes. Observations of classes 1 and 2
are shown by circles and squares, respectively. The larger rectangles indicate the
matched areas of the input space of the three classifiers c 1 , c 2 ,and c 3 . Based on
these data, the number of matched observations of each class as well as w and
τ are shown for each classifier in Table 5.2.
Recall that the elements of w represent the estimated probabilities of having
generated an observation of a specific class. The estimates in Table 5.2 show that
Classifier c 3 is most certain about modelling class 2, while Classifier c 2 is most
uncertain about which class it models. These values are also reflected in τ 1 ,
which is highest for c 2 and lowest for c 3 .Thus, c 3 is the “best” classifier, while
c 2 is the “worst” - an evaluation that reflects what can be observed in Fig. 5.3.
 
Search WWH ::




Custom Search