Image Processing Reference
In-Depth Information
the least distance) and their mode, the maximally represented
class
, is attributed to the
sample. In Figure
8.5
, the 3-nearest neighbour is actually Class B since the three nearest
samples contain one from Class A (its nearest neighbour) and two from Class B. Since
there are two elements of Class B, then the sample is attributed to this class by the 3-
nearest neighbour rule. As such, selection from more than one point introduces a form of
feature space smoothing and allows the classification decision not to be affected by noisy
outlier
points. Clearly, this smoothing has greater effect for larger values of
k
. (Further
details concerning a more modern view of the
k
-nearest neighbour rule can be found in
Michie et al. (1994).
A Mathcad implementation of the
k-
nearest neighbour rule is given in Code
8.5
. The
arguments are
test
(the vector of measurements of the test sample),
data
(the list of
vectors of measurements of all samples),
size
(the value of
k
) and
no
. The final parameter
no
dictates the structure of the presented data and is the number of classes within that data.
The training data is presumed to have been arranged so that samples of each class are all
stored together. For two classes in the training data,
no
= 2, where each occupies one-half
(the same situation as in Figure
8.5
). If
no
= 3 then there are three classes, each occupying
one-third of the complete data set and the first third contains the first class, the second third
contains samples of another class whilst the remaining third contains samples of the final
class. In application, first the distances between the current sample,
test
, and all other
samples are evaluated by using the function
distance
. Then the
k
nearest neighbours are
selected to form a vector of distances
min
, these are the
k
neighbours which are closest (in
the feature space) to the sample test. The number of feature space splits
fsp
is the spacing
between the classes in the
data.
The class which occurs the most number of times in the
set of
size
nearest neighbours is then returned as the
k-
nearest neighbour, by incrementing
the class number to which each of the
k
neighbours is associated. (If no such decision is
possible, i.e. there is no maximally represented class, then the technique can be arranged
to return the class of the nearest neighbour, by default.)
k_nn(test,data,size,no):=
for i
∈
0..rows(data)-1
dist
i
←
0
for j
∈
0..cols(date)-1
dist
i
←
distance(test,data,i)
for i
∈
0..size-1
posmin
←
coord(min(dist),dist)
dis
posmin
←
max(dist)+1
min
i
←
posmin
rows(data)
no
fsp
for j
∈
1..no
class
j
←
0
for i
∈
0..size-1
for j
∈
1..no
class
j
←
class
j
+1 if [min
i
≥
(j-1)·fsp]·(min
i
<j·fsp)
test_class
←
coord(max(class),class)
test_class
Code 8.5
Implementing the
k
-nearest neighbour rule