Biology Reference
In-Depth Information
| AC | = d ( A , C) =
(1.5 1.1) 2 +( 0.4 ( 0.3)) 2 = 0.412
1.0
0.5
0.5
1.0
1.5
2.0
0
C
0.5
A
1.0
C = (1.1, - 0.3)
A = (1.5, - 0.4)
FIGURE 12-10.
In the case of two tissue samples, the distance from Eq. (12-3) is the geometric distance between
points in the plane. The distance between the gene expressions A and C is depicted.
A
B
C
D
E
F
0
@
1
A
A
B
C
D
E
F
0
:
000
0
:
141
0
:
412
2
:
846
3
:
138
3
:
158
0
:
141
0
:
000
0
:
361
2
:
786
3
:
087
3
:
081
0
:
412
0
:
361
0
:
000
2
:
435
2
:
731
2
:
746
(12-7)
D
¼
:
2
:
846
2
:
786
2
:
435
0
:
000
0
:
361
0
:
500
3
:
138
3
:
087
2
:
731
0
:
361
0
:
000
0
:
632
3
:
158
3
:
081
2
:
746
0
:
500
0
:
632
0
:
000
The clustering process begins with finding the smallest value in the
proximity matrix and merging the respective genes into a cluster. For
this proximity matrix, the smallest distance is 0.141, and thus the first
cluster will contain A and B. In the next step, the proximity matrix is
updated as follows: Genes A and B are replaced by the midpoint
between them, and the distances from the other genes to this midpoint
are calculated, resulting in a matrix with fewer rows and columns.
The process continues until all genes are merged into a single cluster.
For our example, after the initial cluster A/B is formed, D and E will be
merged together. The A/B group would then be clustered with C. The
D/E group would then be combined with F. Finally, we link the
A/B/C cluster to the D/E/F cluster, as we have no more genes to
link. A map of this clustering, called a dendrogram, is shown in
Figure 12-11. The lengths of the dendrogram branches denote the
distances at which the clusters are merged.
2.89
1.93
0.96
0.00
The clustering method described here is known as average linkage,as
each cluster was represented by the midpoint between the newly
merged genes, and this midpoint was then used for updating the
proximity matrix. In complete linkage, on the other hand, the distances
between each gene in the new cluster and the genes in the other clusters
are calculated, and the largest distance is used in the proximity matrix.
In single linkage, the smallest of these distances is used as the distance
between the clusters. Several other linkage methods exist, such as the
centroid method and Word linkage, and we refer the reader to Amaratunga
A
B
C
D
E
F
Observations
FIGURE 12-11.
Dendrogram with average linkage and Euclidean
distance for the gene expression matrix in
Table 12-3. Genes A and B have the most similar
expression patterns, followed by genes D and E.
The combined expression pattern of D and E is
similar to that of F, and the combined expression
pattern of A and B is similar to that of C. Finally,
the A/B/C cluster is linked to the D/E/F cluster.
Search WWH ::




Custom Search