Biomedical Engineering Reference
In-Depth Information
to either developing an in-house portal or adapting open source code. In
collaborating with the Atlas team, a great deal of expertise in curation
and data management of a large collection of gene expression data has
been gained, while at the same time developing a usable prototype that
serves as a proof-of-concept for future development. There were several
learnings we can take away from this experience.
The fi rst is the importance of data curation to the development of a
useful portal. The EBI Atlas uses an ontology to control terms used in
data curation of the various gene expression studies contained therein.
This ontology make the data searchable in a semantically meaningful
way, and also opens the door for using other algorithms which can
understand semantics to be deployed later onto this large collection of
data. One example is that we could go through the entire data collection
and generate signatures of gene changes between normal and various
disease states. Once the signatures have been computed, a gene set
enrichment analysis [15] could be performed on each disease state to
reveal pathways involved in particular diseases. Being able to make a
query, for example, like 'Find all genes upregulated consistently in breast
cancer' and then combining it with other data such as drug target
databases would also be a very powerful use of this kind of semantically
aware data. Of course this is a great experience when all the data are
curated by subject matter experts, but what about in-house legacy data
sets? It is diffi cult to get consistent annotations and curations of these
data sets where in many cases the original scientists may have left the
company a long time ago. It is also helpful to have good user-friendly
tools to enable easy curation of experimental meta-data, and these are
sorely lacking as yet. In this project, a combination of in-house
bioinformatics expertise and computational biologists who are familiar
with the subject matter (e.g. cancer) were a prerequisite to getting high-
quality annotations. As stated above, we believe that for public data sets
this could be a pre-competitive activity where one could outsource the
data curation to experts and the work could be funded through a variety
of organizations.
One of the other learnings in terms of collaborating with an open
source project is that of the provision of support. Typically when dealing
with a vendor, certain service levels of support and documentation to
help us install, confi gure, and deploy a system such as the Atlas are
expected. Even then, it is likely that there will be certain teething problems
in going from a relatively open environment to a large controlled
corporate data center. One of the essential components of a successful
collaboration is good communications, and this was certainly the case
￿ ￿ ￿ ￿ ￿
 
Search WWH ::




Custom Search