Graphics Reference
In-Depth Information
PCA [ 7 ]. Nonlinear models are LLE [ 25 ], ISOMAP [ 26 ] and derivatives. They are
concerned with the transformation of the original variables into a smaller number
of projections. The underlying assumptions are that the variables are numeric and
that the dimensions can be expressed as combinations of the actual variables, and
vice versa. Further analysis on this type of techniques will be given in this chapter,
especially for the two most popular techniques: PCA and LLE.
A set of methods are aimed at eliminating irrelevant and redundant features,
reducing the number of variables in the model. They belong to the FS family of
methods. They have the following immediate positive effects on the analysis and
mining:
Speed up the processing of the DM algorithm.
Improve data quality.
Increase the performance of the DM algorithm.
Make the results easier to understand.
Formally, the problemof FS can be defined as follows [ 14 ]: Let A be the original set
of features, with cardinality m .Let f represent the desired number of features in the
selected subset B , B
A . Let the FS criterion function for the set B be represented
by J
. Without any loss of generality, a lower value of J is considered to be a
better feature subset, thus, J could represent the generalization error. The problem
of FS is to find an optimal subset B that solves the following optimization problem:
(
B
)
(
)
min J
Z
.
.
s
t
Z
A
|
Z
|=
d
m
!
possible combinations
of the feature set A . A vast number of FS approaches, trends and applications have
been proposed over the years, and therefore FS deserves a complete chapter of this
book: Chap. 7 .
Other forms of widely used DR also deserve to be described in this section. They
are slightly more complicated than that previously seen, but also very widely used
in conjunction with advanced DM approaches and real applications.
A brute force search would require examining all
d
(
m
d
) !
6.2.1 Principal Components Analysis
In this subsection, we introduction the Principal Components Analysis (PCA) as a
DR method [ 17 ]. A detailed theoretical explanation is out of the scope of this topic,
hence we intend to give details on the basic idea, the method of operation and the
objectives this technique pursues. PCA is one of the oldest and most used methods
for reduction of multidimensional data.
 
Search WWH ::




Custom Search