Information Technology Reference
In-Depth Information
relevant (the one with the highest gain ratio) to the least relevant (the one with the
lowest gain ratio). Then, a decision tree is created starting with the most relevant
feature. This method is computationally efficient because it tests at most a number
of cases equal to the number of features. The danger is that if none of the features
is significantly better than the others then the method may fail to find a good subset,
by contrast if there is a strongly relevant feature the method gives reasonably good
results.
The One-Rule algorithm [ 18 ] ranks the attributes according to the error rate. This
method is sensibly affected by overfitting.
The Relief algorithm uses a nearest-neighbor approach [ 20 ]. The algorithm
updates iteratively a relevance vector of length equal to the number of features,
initially set to zero. In a two-class problem, for a randomly chosen sample, one
nearest point is chosen in the same class and one in the opposite class. The squared
component distances of these two closest examples are component-wise subtracted
from (or added to) the relevance vector depending on whether the closest example
was in the same (or different) class. This procedure is repeated for m (a given para-
meter) times, and those features whose relevance weight, thus computed, are above
a certain threshold are selected. An improvement of the basic algorithm is Relief-F
[ 23 ] that uses M , instead of just one, nearest hits and ensures greater robustness of
the algorithm against noise.
The development in scientific research currently focuses on topics related to data
explosion phenomenon such as FS for ultrahigh dimensional data [ 30 ], and multi-
source FS [ 38 ]. In [ 13 ] there is a case study on feature selection techniques applied
to geographic information systems and geospatial decision support, an application
domain where the growing availability of data poses several challenges along with
important perspectives. There is a growing interest to consider the FS as something
more than just a routine to improve machine learning accuracy; the FR model is
by itself a knowledge model holding important semantic aspects of the information
environment. There have been attempts to further enrich the concept of relevant
featurewith semanticmeanings, such as the contribution of a feature to the knowledge
of the physical process underlying the generation of the data. The usefulness of the FR
in selecting the variables for modelling dynamic systems has been studied in [ 5 ]. A
causal feature selection is proposed in [ 17 ], where the FS is driven by the detection of
cause-effect relationships observed in time. This kind of selection process explicitly
associates the concept of relevant feature with the concept of control variable. One
step forward to the contribution of FS to the modelling of a real system is provided by
[ 12 , 33 ], which in the selection process take into account the interaction of features,
acknowledging the fact that features exhibit group properties that cannot be detected
on individual features, as they were actual components of a system. More recently
there have been attempts to integrate the FS with preexisting basis of knowledge
such as ontologies and association rules [ 7 ].
Search WWH ::




Custom Search