Biology Reference
In-Depth Information
5.2.2.1. Pearson's Correlation Coefficient
Pearson's correlation coefficient of sequences x i and x j is calculated as:
S ij
S ii S jj
ρ ij =
(5.7)
x j ),and x i = N k =1 x ki . Usually we call
Pearson's correlation coefficient correlation for simplification.
Readers should be noted that correlation is ranged from -1 to 1. When one
variable is just a linear function of the other one, X 2 = aX 1 + b , ρ =1,if
a > 0,and ρ =
1 k =1 ( x ki
1
where S ij =
x i )( x kj
N
1 if a < 0. So, if one is only concerned with the extent of the
correlation but not the direction, one can use the square or the absolute value of ρ
as the similarity measure.
Correlation captures the linear dependency between two sequences, as illus-
trated in Fig. 5.2(a) and (b). However, it does not capture the nonlinear depen-
dency, as shown in Fig. 5.2(c), where X 2 = sin ( X 1 )+ ,and is a normally
distributed noise term.
Fig. 5.2.
Correlation of two variables: (a) and (b) linear dependence; (c) nonlinear dependence.
5.2.2.2. Mutual Information
In addition to linear dependency as shown in Fig. 5.2(a) and (b), mutual infor-
mation is also capable of capturing nonlinear dependency such as the one shown
in Fig. 5.2(c). Mutual information comes from information theory and is based
on Shannon entropy. The mutual information between two sequences x i and x j is
Search WWH ::




Custom Search