Database Reference
In-Depth Information
gene profiles for the entire scientific community.
Thus, it has recently appeared useful for biologists
to take advantage of these archives of responses
for different purposes.
Despite the genome wide dimension of micro-
arrays and their expression data, results published
in scientific media generally focus on the hundred
first differentially expressed genes among thou-
sands of a whole genome, and real discussions are
on ten of them only. So novel statistical analyses
may be led on archived data in order to explore
them more deeply, and confirm original results
or discover new knowledge.
Another use of these public archives is for
comparative analyses. New micro-array experi-
mental data may be compared to previous ones in
order to highlight similar and specific responses
to a particular biological test.
A third use of these expression datasets is to
involve multiple data sets in a new meta-analysis.
In order to highlight similar and specific biological
responses to a particular biological test, it seems
promising to transversally analyze the largest set
of related data. Combined analyses of multiple
data sets and their issues have focused either on
differential expressions (Rhodes et al., 2002, Choi
et al., 2003) or on co-expressed genes (Eisen et
al., 1998, Lee et al., 2004). (Hong and Breitling,
2008) evaluated three statistical methods for
integrating different micro-array data sets and
concluded that meta-analyses may be powerful
but have to be led carefully.
Nevertheless, biologists that are interested
in studying micro-array data and finding novel
knowledge face a very complex task. Navigating
manually into huge amounts of diverse data stored
in these public repositories is such a tedious task
that they finally lead restricted studies and make
limited conclusions. Systems like GEO for the
NCBI 3 , ArrayExpress 4 for the EBI 5 , Gemma 6 or
Genepattern 7 allow investigators to share data and
analyses results, they provide user-friendly tools
allowing the analysis of global expression data,
as collected by DNA micro-array experiments.
There are still critical points, on one hand to
combine directly data sets derived from different
experimental processes and micro-arrays, and on
another hand, to take benefit from the whole set
of related information.
In this context, our approach is to enable
meta-analyses involving multiple types of source
data including aggregated or synthetic data and
semantic aspects.
This chapter presents the semantic data ware-
housing approach AMI ( Analysis Memory for Im-
munosearch ) that we designed in order to facilitate
storage and intelligent querying of:
gene
expression
data
from
multiple
experiments,
refined data (
aggregate or synthetic) result-
ing from statistical analyses and data min-
ing methods,
data and metadata representing all related
information from the biological domain.
All these different kinds of information may
be considered as dimensions of the semantic data
warehouse. Refined data may be considered as facts
in a standard data warehouse. One idea is to take
advantage of semantic relationships among metadata
for querying this data warehouse and provide relevant
comparative analyses. Technical solutions in AMI
knowledge base and search engine are based on se-
mantic web techniques such as semantic annotation
languages and underlying ontologies.
The work realized within theAMI project aims
in a final stage at providing the scientist user with
semi-automatic tools facilitating navigation and
comparative analyses into a whole set of compa-
rable experiments and multiple sources of informa-
tion related to a particular bi- 3 Data warehousing
approach for Genomics Data Analysis biological
process. This work was done in collaboration with
the Immunosearch company 8 whose projects focus
on human biological responses to chemicals. In a
first step, AMI is devoted to human skin biologi-
cal reactions only.
Search WWH ::




Custom Search