Data Integration Solution for Organ-Specific Studies: An Application for Oral Biology - Biomedical Engineering Systems and Technologies

Biomedical Engineering Reference

In-Depth Information

replicated whilst other data are simply connected through links and identifiers. This

approach was integrated into the Oralome project.

Once target resources and data were identified, the modelling iteration started. This

task consisted of designing a common information model to support oral cavity data

from distinct resources.

Before the actual data integration, a system skeleton needed to be deployed. As

mentioned above in this article, there are several frameworks designed for rapid pro-

totyping of data portals for life science projects, such as LOVD or GMODWeb. For

this specific task, we have chosen the Molgenis framework for its agility in creating a

database and application, complete with data exploration web workspace, REST and

SOAP web services, and R interface out of the box. For the data integration process,

Molgenis provides easy and direct data input, whether through the web interface,

through any of the available services, or through a provided database API. Therefore,

custom data wrappers, collecting data from miscellaneous resources, can be easily

implemented. Oralome required the deployment of general-purpose wrappers, com-

bining external data in the newly deployed Molgenis instance. These wrappers allow

for systematic information extraction from resources such as UniProt, NCBI or

STRING, amongst others. These resources provide several ways to retrieve informa-

tion, such as REST interfaces or APIs for Java development.

Executing this streamlined data integration workflow, curated oral cavity data is

collected and re-organized in a publicly available web framework.

3.3

Oralome Development

Oralome consist of a set of tools and a database that provide access to information

related to several entities, such as microorganisms, proteins, diseases and pathways,

integrating crucial data regarding the oral cavity.

The upper entity is a microorganism which has several associated proteins. A pro-

tein itself has other identifiers linked to it, such as OMIM (Online Mendelian Inherit-

ance in Man), KEGG (Kyoto Encyclopedia of Genes and Genomes), PDB (Protein

Data Bank) and GO (Gene Ontology) terms. The main subject for this tool consists of

two groups of proteins: (1) a subset of microbial proteins determined experimentally,

and (2) microbial proteins expected to be present in saliva. Regarding the first group,

besides the information retrieved from UniProt, Oralome will integrate information

related to the environment where a protein was identified (health or disease, regula-

tion, age group, and the particular source where it resides, for instance, mucosa or

tongue).

For Oralome tool development we chose the Molgenis framework for generating

all the necessary tools and features needed to start compiling our database and to view

this data in an easy and rapid way.

Molgenis consists of a framework written in Java, which accepts two XML files as

input: a database and a user interface descriptor file. Using the first file, users can

specify how the database will be structured, its entities and relations; the second file

specifies the layout for the web interface. Molgenis generates a Java model and a

database API which are used to deploy the related SQL tables, web services and web

interface into a web server (Fig. 2).

Biomedical Engineering Systems and Technologies

Search WWH ::

Custom Search

Home