Creating an in-house ’omics data portal using EBI Atlas software - Open Source Software in Life Science Research

Biomedical Engineering Reference

In-Depth Information

to either developing an in-house portal or adapting open source code. In

collaborating with the Atlas team, a great deal of expertise in curation

and data management of a large collection of gene expression data has

been gained, while at the same time developing a usable prototype that

serves as a proof-of-concept for future development. There were several

learnings we can take away from this experience.

The fi rst is the importance of data curation to the development of a

useful portal. The EBI Atlas uses an ontology to control terms used in

data curation of the various gene expression studies contained therein.

This ontology make the data searchable in a semantically meaningful

way, and also opens the door for using other algorithms which can

understand semantics to be deployed later onto this large collection of

data. One example is that we could go through the entire data collection

and generate signatures of gene changes between normal and various

disease states. Once the signatures have been computed, a gene set

enrichment analysis [15] could be performed on each disease state to

reveal pathways involved in particular diseases. Being able to make a

query, for example, like 'Find all genes upregulated consistently in breast

cancer' and then combining it with other data such as drug target

databases would also be a very powerful use of this kind of semantically

aware data. Of course this is a great experience when all the data are

curated by subject matter experts, but what about in-house legacy data

sets? It is diffi cult to get consistent annotations and curations of these

data sets where in many cases the original scientists may have left the

company a long time ago. It is also helpful to have good user-friendly

tools to enable easy curation of experimental meta-data, and these are

sorely lacking as yet. In this project, a combination of in-house

bioinformatics expertise and computational biologists who are familiar

with the subject matter (e.g. cancer) were a prerequisite to getting high-

quality annotations. As stated above, we believe that for public data sets

this could be a pre-competitive activity where one could outsource the

data curation to experts and the work could be funded through a variety

of organizations.

One of the other learnings in terms of collaborating with an open

source project is that of the provision of support. Typically when dealing

with a vendor, certain service levels of support and documentation to

help us install, confi gure, and deploy a system such as the Atlas are

expected. Even then, it is likely that there will be certain teething problems

in going from a relatively open environment to a large controlled

corporate data center. One of the essential components of a successful

collaboration is good communications, and this was certainly the case

Search WWH ::

Custom Search

Home