Building disease and target knowledge with Semantic MediaWiki - Open Source Software in Life Science Research

Biomedical Engineering Reference

In-Depth Information

Figure 17.1

Data loading architecture

As Chapter 16 by Alquier describes the core SMW system in detail, here

we will highlight the important Targetpedia-specifi c elements of the

architecture, such as:

■ Source management: for external sources (public and licensed

commercial) a variety of data replication and scheduling tools were

used (including FTP, AutoSys [19] and Oracle materialised views) to

manage regular updates from source into our data warehouse. Most

data sets are then indexed by and made queryable by loading into

Oracle, Lucene [20] or SRS [21]. Data sources use a vast array of

different identifi ers for biomedical concepts such as genes, proteins

and diseases. We used an internal system (similar to systems such as

BridgeDb [22]) to provide mappings between different identifi ers for

the same entity. Multiprotein targets were sourced from our previously

described internal drug target database [2]. Diseases were mapped to

our internal disease dictionary, which is an augmented form of the

disease and condition branches of MeSH [23].

■ Data provision: for each source, queries required to obtain information

for the wiki were identifi ed. In many instances this took the form of

summaries and aggregations rather than simply extracting data 'as-is'

Search WWH ::

Custom Search

Home