Graphics Reference
In-Depth Information
2 table using the suitable degrees of freedom or any available software that is able
to provide this value. If the significance level of such a table is below the established
one (or the statistic value computed is above the needed one in the table), we can say
that the null hypothesis is rejected and therefore, A and B are statistically correlated.
χ
3.2.1.2 Correlation Coefficient and Covariance for Numeric Data
When we have two numerical attributes, checking whether they are highly correlated
or not is useful to determine if they are redundant. The most well-known correlation
coefficient is the Pearson's product moment coefficient, given by:
i = 1 (
i = 1 (
a i
A
)(
b i
B
)
a i b i )
m A B
r A , B =
=
,
(3.3)
m
σ A σ B
m
σ A σ B
where m is the n umb er of instances, a i and b i are the values of attributes A and B in
the instances, A and B are the mean values of A and B respectively, and
σ A and
σ B
are the standard deviations of A and B .
Please note that
0 it means that the two attributes
are positively correlated: when values of A are increased, then the values of B are
incremented too. The higher the coefficient is, the higher the correlation between
them is. Having a high value of r A , B could also indicate that one of the two attributes
can be removed.
When r A , B =
1
r A , B ≤+
1. When r A , B >
0, it implies that attributes A and B are independent and no correla-
tion can be found between them. If r A , B <
0, then attributes A and B are negatively
correlated and when the values of one attribute are increased, the values of the other
attribute are decreased. Scatter plots can be useful to examine how correlated the
data is and to visually check the results obtained.
Similarly to correlation, covariance is an useful and widely used measure in sta-
tistics in order to check how much two variables change together. Considering th at
the mean val ue s are the expected values of attributes A and B , namely E
(
A
) =
A
and E
(
B
) =
B , the covariance between both is defined as
i = 1 (
a i
A
)(
b i
B
)
Cov
(
A
,
B
) =
E
((
A
A
)(
B
B
)) =
.
(3.4)
m
It is easy to see the relation between the covariance and the correlation coefficient
r A , B given in Eq. ( 3.3 ) expressed as
(
,
)
Cov
A
B
r A , B =
.
(3.5)
σ A σ B
If two attributes vary similarly, when A
B and thus the
covariance is positive. On the other hand, when one attribute tends to be above
its expected value whereas the other is below its expected value, the covariance is
>
A then probably B
>
 
 
Search WWH ::




Custom Search