Biomedical Engineering Reference
In-Depth Information
databases and does an enrichment analysis over GO and KEGG and licensed
applications such as MetaCore from GeneGo, IPA from Ingenuity, and
Pathway Studio from Ariadne Genomics. For drugs the application federates
the National Library of Medicine Drug Information Portal.
16.7.5
Text Indexing and NLP
Access to text sources is provided via four paradigms. First, as described previ-
ously for accessing textual reference information for a gene of interest, it is
provided by federating the PubMed interface of MEDLINE abstracts and
Google Scholar.
Second, simple text indexing is provided through Lucene [16], an open-
source text indexer. Currently, the system indexes text sources such as group
folders and repositories of abstracts of scientifi c conferences.
Third, we decided to curate a set of scholarly articles for biomedical asser-
tions. At the start of the project a set of important biomedical concepts were
identifi ed by the users—such as a set of particular diseases and targets.
Published scholarly articles contain a wealth of information about these topics
of interest, and it is a challenge to use computational approaches such as text
indexing or even natural language processing (NLP) algorithms [25] to extract
quantitative facts with high accuracy. Quantitative facts such as the number of
subjects or percent of observations within the study used to establish an asser-
tion are critical for decision making in areas such as biomarker and disease
indication selection. We used the services of a team of biologists who extracted
biomedical facts from a selected set of journal articles. The extraction was done
using a predefi ned structured template and the data were subsequently tagged,
stored, and made searchable within tranSMART.
While the manual process provides high-accuracy extraction of biomedical
assertions it is very resource expensive and conversely its coverage is limited.
Therefore, fourth, tranSMART also provides access to assertions about bio-
markers generated from MEDLINE by the Ariadne Genomics MedScan
Reader engine. For this application we have been devoting considerable
resources to improve the accuracy of the NLP engine and the fi delity of asser-
tion extraction for specifi c scientifi c subdomains such as immunology, oncol-
ogy, and clinical trials by developing specifi c text - mining cartridges.
16.7.6
Workfl ows
In silico analyses rarely consist of only one step—they are typically a result of
multistep, complex workfl ows. The system addresses some of the workfl ows by
extending the data export capability and deploying specifi c interfaces to other
applications such as Microsoft Excel and Ariadne Genomics Pathway Studio
for further mining and modeling of data. For example, gene expression data
can be exported directly to Ariadne Genomics Pathway Studio by a single
click and further analyzed in the external application. Here pathway and
Search WWH ::




Custom Search