A Geometric Approach to Feature Ranking Based Upon Results of Effective Decision Boundary Feature Matrix - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

Algorithm 2 First phase : a FE algorithm (BVQ in this example) is applied to the

training set, and then the feature ranking algorithm is executed

1: Let X ={ x 1 , x 2 , ..., x m } be the m -dimensional normalized dataset.

2: Apply the BVQ algorithm to X. Let Y ={ y 1 , y 2 , ..., y n } be the extracted eigenfeatures.

3: Compute the contributive weight w i of each feature x i to the eigenfeatures of Y.

4: Sort the features of X such that x a

< w b .Let X s

={ x 1 , x 2 , ..., x s m } be the sorted

< x b if w a

dataset and m the rank index.

Algorithm 3 Second phase : on an incrementing number of features, taken in the

rank order from the test set, the 1NN classification process is run and the accuracy

calculated

1: The dataset X s is input.

2: Apply 1NN to whole X s ,let A m be the returned accuracy.

3: For rank i =1 to m (where m

22 for this dataset):

let X i

x 1 ,

x 2 , ...,

x i }

be a subset of X s with selected features up to rank i.

•

compute accuracy A i using 1NN with 10-fold cross-validation.

For the first fold, the decision boundary depicted by BVQ is reported in Fig. 4.3 b,

altogether with features extracted on the basis of the DBFE method.

The BVQ setting: Optimal values for

and local region r have been found by a

manually conducted search assuming the classification error rate as objective func-

tion. The parameters were fixed to

5; 16 code vectors have been

detected. The choice of the classification algorithm is unimportant to our purpose

since we are interested only in study of the relative performance of ranking algo-

rithms. The 1NN is a non-parametric classifier among the simplest of all machine

learning algorithms, the object is simply assigned to the class of its nearest neigh-

bour on the basis of the Euclidean distance, it does not require settings. In [ 21 ]the

1 NN classifier is indicated as a convenient algorithm to build the evaluation function,

since it appears to always provide a reasonable classification performance in most

applications.

For this experiment the resulting accuracies, in the order they were calculated, are

reported in Table 4.1 and plotted aside. The curve shows a steep rise which expresses

the high contribution to classification accuracy by the two highest rank features.

Beyond a critical point, which in this example occurs on the second feature, the

curve tends to decrease because irrelevant features (low rank) are added, which only

cause curse of dimensionality. By a way of comparison, a random sorting of features

has been used, to which the same validation procedure is applied. The fifth column

of Table 4.1 and the corresponding plot represent the average accuracy achieved by

20 different 1NN classifiers, where the features are selected according to 20 different

ranks obtained by means of trivial random permutations.

As a figure of merit to characterize the performance of the ranking method we

define an empirical Performance Index

ʔ =

4 and r

AreaFR

−

AreaRP

Performance Index

(ˆ) =

(4.7)

AreaMax

−

AreaRP

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home