Biology Reference
In-Depth Information
A
B
C
D
E
F
H
G
I
J
K
L
M
N
O
P
FIGURE 2.10
An example of a dendrogram, such as may be output by a hierarchical clustering algorithm.
experiments is to explore patterns of gene expression over time, or over a range of
different experimental conditions. Genes with similar patterns of expression may be
regulated by the same transcription factor (TF), or may participate in the same bio-
logical process. Clustering of microarray data is therefore a useful mechanism of
hypothesis generation.
There are, unsurprisingly, hundreds of clustering algorithms that have been applied
to microarray data. Cluster analysis is simple, easy to understand, and potentially very
powerful. There is a clustering algorithm for almost any type of data, and the clustering
process can help to uncover biologically relevant patterns of relationships in otherwise
unmanageably largedatasets. However, there are a number of caveats thatmust beborne
in mind when choosing and applying a clustering algorithm.
￿ Number of clusters: Many clustering algorithms require the user to specify how
many clusters exist in the data, and will return that number of clusters, whether or
not the result is biologically plausible. The k -means is one such algorithm. Unless
there is other evidence for the existence of a specific number of clusters in the
data the results of such an algorithm should be critically evaluated.
￿
Stochasticity: Clustering algorithms often incorporate an element of chance, with
regards to issues such aswhich data item is selected in any iteration of the algorithm.
Stochastic algorithms may produce different cluster memberships in different runs.
Such algorithms should therefore be run repeatedly and the results combined.
￿
Sensitivity to ordering: A similar issue is that some algorithms can predict
different cluster memberships depending upon the ordering of the input data.
Search WWH ::




Custom Search