Graphics Reference
In-Depth Information
estimate the relationship between any attribute to the class. This requires the coding
of the target class as a binary vector.
Again, these types of measures are closely related to information and distance
measures. Features with a strong association with the class are good features for
predictive tasks. One of the most used dependence measures is the Bhattacharyya
dependence measure B , defined as:
log P
dx
P
B
(
A j ) =
(
c i )
(
A j
=
a
|
c i )
P
(
a j
=
a
)
7.2.2.4 Consistency Measures
The previous measures attempt to find the best features that can explain maximally
one class from the others, but are not able to detect whether one of them is redundant.
On the other hand, consistency measures attempt to find a minimum number of
features that separate classes as the full set of features can. They aim to achieve
P
. Feature evaluation rules derived from consistency
measures state that we should select the minimum subset of features that canmaintain
the consistency of data as observed by the full set of features. An inconsistency is
defined as the case of two examples with the same inputs (same feature values)
but with different output feature values (classes in classification). Using them, both
irrelevant and redundant features can be removed.
(
C
|
FullSet
) =
P
(
C
|
SubSet
)
7.2.2.5 Accuracy Measures
This form of evaluation relies on the classifier or learner. Among various possible
subsets of features, the subset which yields the best predictive accuracy is chosen.
This family is distinguished from the previous four due to the fact that is directly
focused on improving the accuracy of the same learner used in the DM task. However,
we have to take some considerations into account. Firstly, how to truly estimate the
predictive accuracy avoiding the problem of over-fitting. Secondly, it is important to
contemplate the required time taken by the DMmodel to complete learning from the
data (usually, classifiers perform more complex tasks than the computation of any of
the four measures seen above). Lastly, the subset of features could be biased towards
an unique model of learning, producing subsets of features that are not generalized.
Table 7.2 summarizes the computation of the accuracymetric and some derivatives
that have been used in FS. The notation used is: tp , true positives; fp , false positives;
fn , false negatives; tn , true negatives; tpr
=
tp
/(
tp
+
fn
)
, sample true positive
rate; fpr
=
fp
/(
fp
+
tn
)
, sample false positive rate; precision
=
tp
/(
tp
+
fp
)
;
recall
=
tpr .
 
Search WWH ::




Custom Search