Information Technology Reference
In-Depth Information
10.2.1 Relationship Between Feature Subset Selection
and Frequent Subtree Interestingness
In general, the objective of feature subset selection as defined in [ 18 ] is to find a
minimum set of attributes such that the resulting probability distribution of the data
classes is as close as possible to the original distribution obtained using all attributes.
Han and Kamber in [ 18 ] asserted that domain expertise can be employed in order
to pick out useful attributes. However, because the data mining task involves a large
volume of data and unpredictable behaviour of data during data mining, this task is
often expensive and time consuming.
The test of statistical significance is one of the prominent approaches in evalu-
ating attributes/features usefulness. Stepwise forward selection, stepwise backward
selection and a combination of both are three commonly used heuristic techniques
utilized in statistical significance tests such as linear regression and logistic regres-
sion [ 18 ]. Moreover, the application of correlation analysis such as the chi-square
test is also valuable in identifying redundant variables for feature subset selection.
Another powerful technique for this purpose is the Symmetrical Tau [ 54 ], which
is a statistical-heuristic feature selection criterion. It measures the capability of an
attribute in predicting the class of another attribute. Additionally, information gain
is another attributes relevance analysis method employed in the popular ID3 [ 33 ]
and C4.5 [ 34 ] as reported in [ 18 ]. An extensive overview and comparison of the
different approaches to the feature subset selection problem has been provided in
[ 6 , 11 , 21 , 30 ].
While the original purpose of feature subset selection is to reduce the number of
attributes to only those attributes relevant for a certain data mining task, they never-
theless can be utilized to measure the interestingness of rules/pattern generated. For
example, if the rule/pattern consists of irrelevant attributes, the aforementioned mea-
sure can also give some indication that the rule/pattern is not interesting. Moreover,
[ 12 ] stated that there are three roles of interestingness measures. The first is their
ability to discard uninteresting patterns during the mining process, thereby narrowing
the search space and improving the mining efficiency. The second role is to calcu-
late the interestingness scores for each pattern, which allows the ranking of patterns
according to specific needs. The final role is the use of interestingness measures dur-
ing the post-processing stage to select interesting patterns. Interestingness measures
such as the chi-square test [ 8 ], Symmetrical Tau [ 54 ] and Mutual Information [ 44 ],
are capable of measuring the interestingness of rules and at the same time identifying
useful features for frequent patterns.
Since frequent patterns are generated based solely on frequency without
considering their predictive power, the use of frequent patterns without selecting
appropriate features will still result in a huge feature space which leads to larger
volume and complexity of rules. This might not only slow down the model learn-
ing process, but even worse, the classification accuracy deteriorates (another kind of
overfitting issue since the features are numerous) [ 9 ].
 
Search WWH ::




Custom Search