Biology Reference
In-Depth Information
Preli´ et al. , 2006 ). The rationale behind the bicluster approach is to better match the
concept of “regulatory module”: a subset of genes that are affected (positively or
negatively) in a subset of the conditions. This leads to almost intractable issues
relating to both the huge number of possible biclusters and the difficulty of finding
relevant criterion to score them.
Another interesting alternative to traditional clustering methods is offered by
“component analysis” approaches, in particular, principal component analysis
(PCA) and independent component analysis (ICA). PCA and the related singular
value decomposition are the simplest and most popular dimensionality reduction
techniques that consist of projecting a set of elements (here the genes) from a
high-dimensional space (the space of conditions or hybridizations) onto a low-
dimensional space (the principal components) ( Wall et al. , 2003 ). Principal compo-
nents, “eigengenes,” following the terminology of Alter et al. (2000) , are ordered in
the sense that the i th principal component captures less variance than the i
1th but
still as much as possible of the residual variance, that is, what is not already
accounted for by all the first i
1 principal components. The ICA is a more compli-
cated approach which enforces a statistical independence constraint, instead of
orthogonality (uncorrelatedness) in the PCA; it has been argued to be more relevant
for biological problems and in practice produces a different decomposition. The
underlying factors identified by ICA were termed “expression modes” by
Liebermeister (2002) . In both PCA and ICA, the coordinates of the new components
can serve to define gene clusters that can overlap ( Lee and Batzoglou, 2003 ). Net-
work component analysis is yet another “component analysis” method that cannot be
used for cluster discovery as the connectivity matrix which defines whether a par-
ticular gene is connected to/regulated by a particular underlying component/
transcription factor is given beforehand. Instead, from the activity of the target genes,
the method estimates the activity of each transcription factor and the extent of each
regulation ( Liao et al. , 2003 ).
We end this tour by mentioning the great potential of combining transcriptome
analysis and the search for transcription factor binding sites. This combined
approach leads closer to gene regulatory network discovery than the previous
approaches as a molecular basis for the regulation is proposed alongside a list of reg-
ulated genes. Its simplest implementation consists of applying motif discovery algo-
rithms such as MEME ( Bailey and Elkan, 1994 ) or MDscan ( Liu et al. , 2002 ), to the
promoter regions upstream of genes identified as an expression cluster or as differ-
entially expressed. In this case, a DNA motif may become detectable because the set
of sequences subjected to the analysis has been enriched in this motif. Alternative
methods, that reverse the perspective of the analysis compared to traditional motif
discovery algorithms by trying to model the expression pattern as a function of
the sequence instead of focusing on sequence modelling, have also been proposed
and show promising results. A clear advantage is that they can more easily bypass
the need to delineate the exact contours of each clusters in which a motif is searched
for. REDUCE ( Bussemaker et al. , 2001; Foat et al. , 2006 ), for instance, makes use of
a continuous expression trait such as fold-change in a differential expression analysis
Search WWH ::




Custom Search