Building disease and target knowledge with Semantic MediaWiki - Open Source Software in Life Science Research

Biomedical Engineering Reference

In-Depth Information

this hypothesis. SMW offered many of the capabilities we needed

'out of the box' - certainly enough to produce a working prototype.

In addition, the knowledge that this same software underlies

both Wikipedia and our internal corporate wiki suggested that

(should we be successful), developing a production system should be

possible.

■ Familiarity: as the majority of scientists within the company were

familiar with MediaWiki-based sites, and many of our specifi c target

customers had set up their own instances, we should not face too high

a barrier for adopting a new system.

■ Extensibility: although SMW had enough functionality to meet

early stage requirements, we anticipated that eventually we would

need to extend the system. The open codebase and modular design

were highly attractive here, allowing our developers to build new

components as required and enabling us to respond to our customers

quickly.

■ Semantic capabilities: a key element of functionality was the ability to

provide summarisation and taxonomy-based views across the proteins

(described in detail below). This is actually one of the most powerful

core capabilities of SMW and something not supported by many of the

alternatives. The feature is enabled by the 'ASK' query language [8],

which functions somewhat like SQL and can be embedded within wiki

pages to create dynamic and interactive result sets.

Data sourcing

Using a combination of user guidance and access statistics from

legacy systems, we identifi ed the major content elements required for

the wiki. For version one of Targetpedia, the entities chosen were:

proteins and protein targets, species, indications, pathways, biological

function annotations, Pfi zer people, departments, projects and research

units.

For each entity we then identifi ed the types and sources of data the

system needed to hold. Table 17.1 provides an excerpt of this analysis for

the protein/target entity type. In particular, we made use of our existing

infrastructure for text-mining of the biomedical literature, Pharmamatrix

(PMx, [9]). PMx works by automated, massive-scale analysis of Medline

and other text sources to identify associations between thousands of

biomedical entities. The results of this mining provide a rich data source

to augment many of the areas of scientifi c interest.

Search WWH ::

Custom Search

Home