Databases Reference
In-Depth Information
applications. More often than not, we have to perform feature selection
in order to obtain meaningful results. 3 Feature selection has the following
prominent functions:
Enabling: Feature selection renders the impossible possible. As we know,
every data mining algorithm is somehow limited by its capability in
handling data in terms of sizes, types, formats. When a data set is too
huge, it may not be possible to run a data mining algorithm or the data
mining task cannot be effectively carried out without data reduction. 4
Feature selection reduces data and enables a data mining algorithm to
function and work effectively with huge data.
Focusing: The data includes almost everything in a domain (recall that data
is not solely collected for data mining), but one application is normally
only about one aspect of the domain. It is natural and sensible to focus
on the relevant part of the data for the application so that the search is
more focused and the mining is more ecient.
Cleaning: The GIGO (garbage-in-garbage-out) principle 5 applies to almost
all, if not all, data mining algorithms. It is therefore paramount to clean
data, if possible, before mining. By selecting relevant instances; we can
usually remove irrelevant ones as well as noise and/or redundant data.
The high quality data will lead to high quality results and reduced costs
for data mining.
This procedure generates a subset of features that are relevant to the
target concept. 6 There are basically three kinds of generation procedure
which are listed below.
If the original feature set contains N number of features, then the total
number of competing candidate subsets to be generated are 2 N .Thisisa
huge number even for medium-sized N . There are different approaches for
solving this problem, namely: complete, heuristic, and random.
Complete:
This generation procedure does a complete search for the
optimal subset according to the evaluation function used. An exhaustive
search is complete. However, Schimmer argues 7 that “just because the
search must be complete do not mean that it must be exhaustive.” Different
heuristic functions are used to reduce the search without jeopardizing
the chances of finding the optimal subset. 8 Hence, although the order of
the search space is O (2 N ), a fewer subsets are evaluated. The optimality
of the feature subset, according to the evaluation function, is guaranteed
Search WWH ::




Custom Search