Biology Reference
In-Depth Information
probing similar conditions. Comprehensive data have been used to
provide functional links for unclassified genes,
3,6-9
to predict novel
cis
-
regulatory elements,
7,10-12
and to elucidate the structure of the transcrip-
tional program.
12,13
Large-scale expression data may result from systematic efforts to
characterize a range of transcription states by testing many different bio-
logical conditions.
6,13,14
In addition, large datasets can be assembled by
collecting expression profiles and pooling them into one comprehensive
database [Fig. 1(b)]. Until recently, these data appeared in different
formats and were scattered among various internet sites (if available at
all).
4
The increasing availability of microarray technology and the ensu-
ing explosion of available expression profiles (usually obtained in differ-
ent laboratories using different array technologies) have prompted the
establishment of standardized annotations such as the MIAME
15
and
MAGE-ML
16
standards, and a number of public repositories for chip
data.
17-20
Single microarray experiments are global only in the sense that the
genes probed span the entire or most of the genome. The idea of com-
posing large-scale expression datasets is to include a large variety of
conditions in order to span also the expanse of transcriptional states of
the cell. While this is a necessary step towards the elucidation of the tran-
scription programs, such data present new and serious challenges to the
mathematical and computational tools used to analyze them. In particu-
lar, the context-specific nature of regulatory relationships poses a difficult
computational problem. Consequently, a sizeable variety of different
approaches has been proposed in the literature (see review by Ihmels and
Bergmann
21
).
1.1. The Modular Concept
Whenever we face a large number of individual elements that have het-
erogeneous properties, grouping elements with similar properties can
help to obtain a better understanding of the entire ensemble. For
example, we may attribute human individuals of a large cohort to dif-
ferent groups based on their sex, age, profession, etc. in order to obtain