Using microarrays to measure cellular changes induced by biomaterials - Characterization of Biomaterials

Biomedical Engineering Reference

In-Depth Information

experimental treatment and one for a control array, you can visualize this as

a straight line between the two points in two-dimensional space.

Another commonly used distance metric is the Pearson correlation coef-

fi cient, which measures how correlated the profi les are. The Euclidian dis-

tance is very good at clustering together genes or samples that have a similar

profi le in amplitude, whereas the Pearson correlation is better at clustering

together profi les with the same shape regardless of their amplitude.

As is the case with statistical analysis, clustering techniques break down

with noise (especially with the Pearson distance metrics). It is therefore

highly recommended to remove genes/assays with a noisy profi le such as

genes that express at low levels in the background range of the array. The

cut-off can be arbitrary and inclusion can originate from a previous statis-

tical analysis. As a caution, be aware that performing a cluster analysis on

genes that were found signifi cant for separating two classes will illustrate

results you have discovered previously.

Once you decide on an appropriate distance metric, you will have to

select the method to perform the classifi cation. The most commonly used

methods are:

hierarchical clustering (Eisen

et al. , 1998; Spellman et al. , 1998)

k-means clustering (Theilhaber

et al., 2002)

self-organizing tree (Dopazo and Carazo, 1997; Herrero

et al. , 2001)

self-organizing maps (Tamayo

et al. , 1999)

principal component analysis (Raychaudhuri

et al., 2000).

Hierarchical clustering is similar to a phylogenetic algorithm in that it com-

putes the distance between every two genes or samples and joins the closest

pair. It then computes all the distances between the genes including the

pair formed and keeps on joining the closest pairs until there is only one

big group left. This algorithm grows a tree with iteration and is classifi ed as

'agglomerative' (Plate II, see color section between pages 64 and 65).

K-means clustering, self-organizing maps and self-organizing trees are

divisive algorithms. They start with the whole data set and split it into

clusters. For example, with K-means, the user inputs how many clusters

he thinks there are in the dataset and then the software randomly assigns

genes to a cluster. It will then iteratively compute the average for the cluster

and reassign the genes to the cluster that they are the most similar to. After

a few hundred iterations, the cluster average stabilizes and all the genes

are assigned to their closest cluster. Since these algorithms start selecting

and clustering genes at random, they do not always yield the same results.

Some methods repeat the procedure a dozen to a hundred times and report

the consensus clusters (the genes that cluster together most of the time).

Nonetheless, even if you are able to reproduce the same result within

Characterization of Biomaterials

Search WWH ::

Custom Search

Home