Biomedical Engineering Reference
In-Depth Information
MIAME-compliant gene expression data. Proteomics data repositories have
also been developed, including the Global Proteome Machine Database
(GPMDB) (http://gpmdb.thegpm.org), the Proteomics Identifi cations
Database (PRIDE) [14] and PeptideAtlas [15].
Due to these efforts, investigators from various fi elds, including biomedical
sciences and computational biology, can access public data repositories to
enhance their research. For example, Lukk and colleagues constructed a global
gene expression map based on data corresponding to over 5000 samples from
206 studies from 163 laboratories they obtained from GEO and ArrayExpress.
These large-scale studies would have required huge collaboration efforts and
most likely would not have been possible without public data repositories.
Other benefi ts also include the possibility of validating results in an external
cohort of subjects that were made public in one of these databases. More
importantly, smaller laboratories that do not have the resources to collect their
own samples and generate gene expression data now have the capability of
using publicly available data to make important discoveries.
Many journals, including the New England Journal of Medicine , require the
posting of the data before a manuscript is even considered for review to be
accepted for publication. The benefi t of this is that other researchers can rep-
licate published results and build on the published work. The drawback is that
when collaborating with pharmaceutical or other companies/institutions that
would like to further mine the data before releasing it to the world, it might
make it impossible to submit the manuscripts to the journals of choice.
3.6.2
Data Storage and Management
One of the requirements of collaborating within the fi eld of computational
biology is the need to merge data from the different groups. This means not
only the storage of the data but also proper management such that data can
be merged and queried seamlessly. Storing data from collaborators might only
require extra hard drive space, but in most cases it entails far more. Usually
data security involving multitier login-based access to the data is needed. If
data are on human subjects, data anonymity is also a must. Although there are
guidelines for data storage and management of transcriptomics and proteomics
data, metadata might need to be addressed in a different matter. In most cases,
metadata collected by different research groups cannot be merged very easily.
It might require building a new database and fi nding relations between the
different sources in such a way that joining of the tables in the different
schemas or databases can be performed.
The establishment of standards enabled the development of new data man-
agement and data analysis tools to support collaborative and multidisciplinary
studies. For example, MicroGen is a Web system used to store, manage, and
exchange data characterizing spotted microarray experiments according to the
MIAME standards [17]. Similar examples are EDGE(3) for Agilent two-color
microarray experiments [18], MARS (Microarray Analysis and Retrieval
Search WWH ::




Custom Search