Mining the Bibliome - Translational Informatics: Realizing the Promise of Knowledge-Driven Healthcare

Information Technology Reference

In-Depth Information

knowledge. Perhaps the most well known example of the value of extracting

information from biomedical literature to demonstrate the potential to identify new

knowledge is demonstrated in a study done by Swanson in 1986 [ 35 ]. In this study,

Swanson extracted key concepts associated with fi sh oil and Raynaud's disease. By

identifying literature that suggested that Raynaud's disease was associated with an

increase in blood viscosity as well as literature that described the reducing effect on

blood viscosity by fi sh oil, Swanson was able to use the transitive property to sug-

gest the hypothesis that fi sh oil may be used as a treatment for Raynaud's disease.

The subsequent validation of this hypothesis through a clinical trial [ 36 ] has pro-

vided the underpinning inspiration for how new knowledge might be discovered

from biomedical text that is systematically organized with resources like

MEDLINE. However, the most important part of this study was that it fully lever-

aged a process of systematically analyzing relevant documents in a way that key

facts could be identifi ed and later combined using logic relationships.

The promise of bibliomining is embodied by Swanson's discovery, and has

since provided the inspiration for developing algorithmic approaches that aspire

to recreate human intuition for identifying potential relationships. The Swanson

study demonstrates the potential to identify new knowledge, but also highlights

the importance of developing systematic techniques to extract information from

biomedical literature. Since the original study, a computer-mediated system called

ARROWSMITH was developed and enables one to identify potential linkages

between two sets of MEDLINE searches [ 37 , 38 ]. The ARROWSMITH system is

built on the principle that common words or phrases that occur in two sets of docu-

ments may be used to identify potentially interesting linkages and thus suggest

testable hypotheses. The challenge with language, however, is that in the descrip-

tion of archetypal concepts a variety of terms may be used. This is where NLU

systems are essential for the mediation of what was written versus what was meant .

Meaning (or “semantics”) is the underpinning challenge of NLU and metadata

indexing systems.

The general principles that underpin the ARROWSMITH system can be

described through what is referred to as “modeling” algorithms. The essence of

these modeling approaches is that identifi ed concepts are placed into a mathemati-

cal construct that enables the identifi cation of relationships between the concepts.

Relationships are of two general types: (1) direct - where concepts are found to

explicitly occur with each other in a specifi ed context (e.g . , in the same document);

or, (2) indirect - where concepts are related based on inferred relationships that are

based on some logical formalism (e.g . , in the case of Swanson's study, through a

bridging concept that enabled the transitive property to be used to relate otherwise

unrelated concepts). Depending on the particular representation approach used,

various weights can be applied to each relationship. There are a number of ways that

weights can be calculated, including those that are based on direct frequency or

weighted frequency. Direct frequency approaches are based on a simple tabulation

of how often a particular relationship occurs; weighted frequency approaches are

based on tabulation of how often a given relationship occurs, normalized according

to how common the relationship is in the universe of all possible relationships.

Translational Informatics: Realizing the Promise of Knowledge-Driven Healthcare

Search WWH ::

Custom Search

Home