Biomedical Engineering Reference
In-Depth Information
Figure 17.1
Data loading architecture
As Chapter 16 by Alquier describes the core SMW system in detail, here
we will highlight the important Targetpedia-specifi c elements of the
architecture, such as:
Source management: for external sources (public and licensed
commercial) a variety of data replication and scheduling tools were
used (including FTP, AutoSys [19] and Oracle materialised views) to
manage regular updates from source into our data warehouse. Most
data sets are then indexed by and made queryable by loading into
Oracle, Lucene [20] or SRS [21]. Data sources use a vast array of
different identifi ers for biomedical concepts such as genes, proteins
and diseases. We used an internal system (similar to systems such as
BridgeDb [22]) to provide mappings between different identifi ers for
the same entity. Multiprotein targets were sourced from our previously
described internal drug target database [2]. Diseases were mapped to
our internal disease dictionary, which is an augmented form of the
disease and condition branches of MeSH [23].
Data provision: for each source, queries required to obtain information
for the wiki were identifi ed. In many instances this took the form of
summaries and aggregations rather than simply extracting data 'as-is'
￿ ￿ ￿ ￿ ￿
 
Search WWH ::




Custom Search