Biomedical Engineering Reference
In-Depth Information
and NCBO BioPortal [9] host between them more than 300 ontologies for
describing things as diverse as medical terms for disease and loggerhead
turtle nesting behaviors. The Experimental Factor Ontology (EFO) [10]
was created at the EBI specifi cally to describe 'omics experiments and is
used widely in the curation and data analysis process. It is described in
detail in [11].
The EBI Atlas is constructed primarily for transcriptomics experiments.
In these data sets, one of the aspects of curation is to identify and mark
experimental variables being tested, for example disease, tissue, cell type
or drug response, and map these terms to the EFO hierarchy uniformly.
Once the terms are identifi ed, statistical analysis is performed
automatically to identify signifi cantly differentially expressed elements in
the transcriptome (e.g. genes, their transcripts, microRNA). The statistical
framework is described in [12].
In addition to these browsing requirements, which serve the majority
of the bench user community described in Figure 9.2, it is also useful to
have the ability to process the curated data sets using more sophisticated
algorithms, which are not pre-computed automatically for all data
and may be thought of at a later date. To serve this purpose, the data in
the 'omics portal must be accessible through programmatic APIs in
order to automate data processing using other algorithms. The EBI Atlas
software again deals with this in an elegant way, by storing the data
in self-describing and accessible NetCDF format [13] fi les, and also
providing RESTful APIs to the curated data sets, as described in the
next section.
￿ ￿ ￿ ￿ ￿
9.3 The EBI Atlas software
The Atlas was constructed to provide a simple, easy to use and understand
interface to the computed statistics. As such, the requirements are to be
able to query by gene, including any of its attributes such as synonyms,
protein domains, pathway and process annotations, or by curated
experimental variable, or both. Expecting the user to be unfamiliar with
the statistical underpinnings of the analysis, simple summaries of up-/
down-expression patterns are provided: for each gene, the number of
public data sets is reported where it is over- or under-expressed, color-
coded red and blue, respectively. More sophisticated users can create
more complex queries and drill down to individual data points underlying
the analysis results.
 
Search WWH ::




Custom Search