Graphics Reference
In-Depth Information
Table 7.1 Distance measures for numeric variables (between X and Y )
Mathematical form
= i = 1 (
2
2
Euclidean distance
D e
x i
y i
)
D cb = i = 1 | x i y i |
City-block distance
Cebyshev distance
D ch
=
max i
|
x i
y i
|
= i = 1 (
m
1
m
Minkowski distance of order m
D M
x i
y i
)
= i = 1 j = 1 (
Quadratic distance Q , positive definite
D q
x i
y i
)
Q ij
(
x j
y j
)
D ca = i = 1 | x i y i |
Canberra distance
x i + y i
i = 1 x i ·
y i
i = 1 x i i = 1 y i
Angular separation
D as
=
1
2
P
2 P
V
(
A j ) =
(
c i )(
P
(
c i |
A j
=
a
)
P
(
c i ))
(
A j
=
a
)
dx
.
Moreover, some of the most common distance measures for numeric variables
used in FS are summarized in Table 7.1 .
7.2.2.3 Dependence Measures
They are also known as measures of association or correlation. Its main goal is to
quantify how strongly two variables are correlated or present some association with
each other, in such way that knowing the value of one of them, we can derive the
value for the other. In feature evaluation, the common procedure is to measure the
correlation between any feature with the class. Denoting by R
a dependence
measure between feature A and class C , we choose feature A i over feature A j of
R
(
A
)
. In other words, the feature most correlated with the class is chosen.
If A and C are statistically independent, they are not correlated and removing A
should not affect the class separability regarding the rest of the features. In a contrary
case, the feature should be selected because it could somewhat explain the trend of
the class.
One of the most used dependence measures is the Pearson correlation coefficient,
which measures the degree of linear correlation between two variables. For two
variables X and Y with measurements
(
A i )>
R
(
A j )
{
x i }
and
{
y i }
, means
x and
¯
y , this is given by
¯
i (
x i −¯
x
)(
y i −¯
y
)
ρ(
X
,
Y
) =
i (
2
2 i (
1
2
x i −¯
x
)
y i −¯
y
)
1), one of them could be removed.
However, linear correlations are not able to detect relationships that are not linear.
Correlations with respect to the target variable can also be computed in order to
If two variables are very correlated (
ρ ≈±
 
 
Search WWH ::




Custom Search