Database Reference
In-Depth Information
In the two next sections, we give some details
on pre-processing methods and then we discuss
main analysis methods.
criterion only has an influence on data variance.
Co-Expressed Genes
Micro-Array data Pre-Processing
Another frequent analysis on micro-arrays data
is intended to search for genes that have similar
behaviour over conditions. They are called co-
expressed genes . Numerous attempts have been
done to design clustering methods for the identifi-
cation of groups of co-expressed genes. Hierarchi-
cal methods have been particularly developed to
address the specific issues of these data.
Raw intensities (that are stored in so called CEL
files in the case ofAffymetrix micro-arrays) need to
be pre-processed in order to smooth errors and vari-
ability. Then the normalization step is mandatory
since the experimental process often introduces
much noise into data. Normalization methods are
generally fitted to a kind of micro-array.
The LOWESS (Locally Weighted Linear Re-
gression) method (Cleveland and Devlin, 1979)
was designed for two channels micro-arrays.
The RMA (Robust Multiarray Analysis) method
(Irizarry, 2003) is fitted to arrays with replicates
(or multi-arrays); it operates a quantile transfor-
mation in order to obtain the same quantiles over
all arrays.
Standards and ontologies for
Micro-Arrays experiments
The scale and complexity of micro-array experi-
ments require the adoption of standards and on-
tologies so that data from micro-array experiments
can be exploited in full. Here is a list of some used
standards (see also (Stoeckert et al., 2002)):
Micro-Array data Analysis
Gene Ontology (Ashburner et al., 2000) is
a controlled vocabulary widely used to de-
scribe genes and gene products (their func-
tion and location),
MIAME (Brazma et al., 2001) is a standard
Differentially Expressed Genes
One standard analysis on gene expression data
consists on searching for genes that behave dif-
ferently under two given conditions (control/
treatment, time1/time2...). Statistical hypothesis
tests are mostly used.
The fold-change calculated for each gene as the
ratio of the average expression under a condition
and the average expression under another condi-
tion may be used to determine which genes are
differentially expressed but depends on an arbi-
trary threshold. Standard t-tests are performed as
an hypothesis test. A very frequent approach is to
perform anAnalysis of Variance (ANOVA), either
a one way or a two way ANOVA, that are fitted
to data sets with multiple samples. An ANOVA
returns a p-value as a level of significance that
a gene or a group of genes are differentially ex-
pressed. One way ANOVA are applied when one
that defines the minimal information that is
required to understand the experiment and
the data,
MINiML10 (MIAME Notation in Markup
Language) is a data exchange format op-
timized for micro-array gene expression
data. It is defined in XML and allows the
capture of every MIAME information,
MAGE11 is a standard for the represen-
tation of micro-array expression data that
would facilitate the exchange of micro-ar-
ray information between different data sys-
tems: For example, MAGE-OM is a stable
specification for the standard representa-
tion of data in a database and MAGE-ML
defines a common format for data transfer
from one database to another,
Search WWH ::




Custom Search