Information Technology Reference
In-Depth Information
to recognize those informative features allowing to achieve the same classification
accuracy as in the original space. The basic idea of DBFE is that moving along the
direction of the decision boundary, the classification of each observation will remain
unchanged (see Fig. 4.1 a). Hence, the direction of the decision boundary is redundant.
In contrast, while moving along the direction normal to the decision boundary the
classification changes, hence it represents an informative direction. Moreover, the
effectiveness of a direction is directly proportional to the area of decision boundary
with the same normal vector. To discuss this statement, consider Fig. 4.1 b. There,
the border is a rectangle parallel to the axes, so the informative directions defined by
normal vectors to the border are the x and y axes themselves. Although both directions
are informative, it is simple to see that the x -axis is more important since projecting
data on it results in less class overlapping than projecting data on the y -axis.
The idea is formalized by the notion of Effective Decision Boundary Feature
Matrix (EDBFM):
1
N T
S p
EDBFM =
(
x
)
N
(
x
)
p
(
x
)
d x
,
(4.3)
(
x
)
d x
S
is the normal vector at a point x , N T
where N
denotes the transposed normal
vector and S is the portion of decision boundary containing most of the training data
(the effective decision boundary). It has been proved [ 25 ] that:
(
x
)
(
x
)
the rank of the EDBFM represents the intrinsic discriminant dimension , that is
the minimum number of feature vectors needed to achieve the same Bayes error
probability as in the original space;
the eigenvectors of EDBFM corresponding to nonzero eigenvalues are the neces-
sary feature vectors.
In order to construct a Bayes decision border, in [ 25 ] there has been proposed SVM
Decision Boundary Analysis , a method that combines DBFE principle and Support
Vector Machine algorithm. In [ 14 ] the use of Analytical Decision Boundary Feature
Extraction (ADBFE) is introduced, where the normal vectors are calculated analyt-
ically from the equations of the decision border. All methods produce an EDBFM
that represents a data projection matrix onto a new feature space.
4.4 Feature Ranking Based on Effective Decision
Boundary Feature Matrix
4.4.1 Geometric Considerations
As it has been introduced in previous sections, it is desirable to obtain a ranking of
real features on the basis of information contained in EDBFM. The idea is intuitively
 
Search WWH ::




Custom Search