Database Reference
In-Depth Information
downloaded from public repositories or may be
the result of local experiments.
For instance Table 9 gives an excerpt of ANOVA
results for the search of differentially expressed
genes between gene expressions at 7h and at 0h
on Control conditions in the GSE6281 series
described previously. The first three columns
give information on the gene probeset, the fourth
column gives the intensity ratio between condi-
tions, the fifth column give the p-value and the
last column give the differential expression level
(+, - for up- or down- regulation) deduced from
a threshold.
In AMI, we keep only synthetic data obtained
from p-values and fold-change that indicates up-
and down- regulation of genes over two conditions.
These data are stored in relational tables like raw
expression intensity data.
representation of Statistical results
As stated in the AMI overview, another main goal
in designing the semanticAMI data warehouse is to
provide the capability to keep in memory synthetic
data that are provided by statistical analyses, and
in a way that should facilitate information retrieval
on fuzzy and semantic criteria. In a first stage, we
plan to consider two kinds of statistical and data
mining results that are differential gene expres-
sion between two given conditions and clustering
models on gene expression intensities. Differential
gene expression are stored into relational tables
and clustering models are stored as XML repre-
sentations (represented by (5) in Figure 1); each
of them is discussed in following paragraphs.
Data Mining Models of
Gene Expressions
Differentially Expressed Genes
For storage and intelligent retrieval of data mining
models, standard representation formats like XML
and PMML and semantic annotations formats like
RDF are perfectly fitted toAMI requirements. The
Predictive Model Markup Language (PMML) is
an XML-based language that provides a way for
applications to define statistical and data mining
models and to share models between PMML
compliant applications. It was defined by the Data
Mining Group 18 . In this section, we present PMML
extensions we have defined for clustering models.
RDF annotations are detailed in next sections. A
PMML clustering model as illustrated by Figure
While gene expression data table have much
more lines (genes) than columns (conditions), for
instance 50000 genes for 50 conditions, analyses
of pair-wise differentially expression among
conditions provide huge amount of resulting data
too. Search for differentially expressed genes is
frequently processed by one-way or two-ways
ANOVA algorithms.As presented previously (see
section “MICRO-ARRAYS EXPERIMENTS”),
an ANOVA method will provide results as a list
of genes with their p-values over two conditions.
Table 9. Example of results after an ANOVA on gene expression
Probe
Symbol
Description
Fold change7h/0h
p-value7h/0h
Exp
240717_at
ABCB5
ATP-binding cassette
sub-family B
(MDR/TAP)
0.5235
0.01219
+
232081_at
ABCG1
ATP-binding cassette
sub-family G
(WHITE)
1.6253
0.00124
-
. . .
. . .
. . .
. . .
. . .
. . .
 
Search WWH ::




Custom Search