Biology Reference
In-Depth Information
Cluster methods can be further divided into several types. Divisive
clustering begins by considering all genes as a single group, which is then
partitioned into subgroups in a way that maximizes the difference
between them. In contrast, agglomerative clustering begins by grouping
the two genes with the most similar expression patterns, and then
treating them as a single entity in the succeeding steps. It then groups
the next most similar pair or adds the grouped pair to another gene if
there is no other more similar pair. In both cases, it is necessary to
provide a strict mathematical measure for dissimilarity, and we turn to
this question next.
Mathematically, the dissimilarity measure 5 between gene expression
profiles can be defined as a function of the respective rows of the data
matrix X from Eq. (12-1) that quantitatively determines how different the
gene expressions are. Using x i ¼ð
x i1 ;
x i2 ; ...
x in Þ
and x k ¼ð
x k1 ;
x k2 ; ...
x kn Þ
to denote the i th and k th rows of X, we denote the dissimilarity
measure between them by d i ; k ¼
d
ð
x i ;
x k Þ
. A variety of choices exists for
the specific functional form of d
ð
x i ;
x k Þ
, but it must satisfy the following
distance axioms:
1. d
ð
x i
;
x k Þ
0 for any two vectors x i
;
x k ; that is, the distance should be
always positive;
2. d
ð
x i
;
x k
Þ¼
d
ð
x k
;
x i
Þ
; that is, the distance should be symmetric; and
3. d
for any vector x s ; that is, the distance
should satisfy the triangle inequality.
ð
x i
;
x k
Þ
d
ð
x i
;
x s
Þþ
d
ð
x s
;
x k
Þ
By far, the most commonly used dissimilarity measure is based on the
Euclidean distance defined as
t
X
n
2
d E
ð
x i
;
x k
Þ¼
1 ð
x ij
x kj
Þ
:
(12-3)
¼
j
Other commonly used distance measures are the Pearson correlation
distance, given by
1 X
n
x ij
x i
x kj
x k
1
d P ð
x i ;
x k Þ¼
(12-4)
;
n
s i
s k
j
¼
1
the Manhattan or block distance, defined as
X
n
d B
ð
x i
;
x k
Þ¼
1 j
x ij
x kj
j ;
(12-5)
j
¼
and the Chebyshev distance:
d c
ð
x i
;
x k
Þ¼
max j
j
x ij
x kj
j:
(12-6)
5. Dissimilarity measures are also sometimes called distances.
Search WWH ::




Custom Search