Information Technology Reference
In-Depth Information
knowledge. Perhaps the most well known example of the value of extracting
information from biomedical literature to demonstrate the potential to identify new
knowledge is demonstrated in a study done by Swanson in 1986 [ 35 ]. In this study,
Swanson extracted key concepts associated with fi sh oil and Raynaud's disease. By
identifying literature that suggested that Raynaud's disease was associated with an
increase in blood viscosity as well as literature that described the reducing effect on
blood viscosity by fi sh oil, Swanson was able to use the transitive property to sug-
gest the hypothesis that fi sh oil may be used as a treatment for Raynaud's disease.
The subsequent validation of this hypothesis through a clinical trial [ 36 ] has pro-
vided the underpinning inspiration for how new knowledge might be discovered
from biomedical text that is systematically organized with resources like
MEDLINE. However, the most important part of this study was that it fully lever-
aged a process of systematically analyzing relevant documents in a way that key
facts could be identifi ed and later combined using logic relationships.
The promise of bibliomining is embodied by Swanson's discovery, and has
since provided the inspiration for developing algorithmic approaches that aspire
to recreate human intuition for identifying potential relationships. The Swanson
study demonstrates the potential to identify new knowledge, but also highlights
the importance of developing systematic techniques to extract information from
biomedical literature. Since the original study, a computer-mediated system called
ARROWSMITH was developed and enables one to identify potential linkages
between two sets of MEDLINE searches [ 37 , 38 ]. The ARROWSMITH system is
built on the principle that common words or phrases that occur in two sets of docu-
ments may be used to identify potentially interesting linkages and thus suggest
testable hypotheses. The challenge with language, however, is that in the descrip-
tion of archetypal concepts a variety of terms may be used. This is where NLU
systems are essential for the mediation of what was written versus what was meant .
Meaning (or “semantics”) is the underpinning challenge of NLU and metadata
indexing systems.
The general principles that underpin the ARROWSMITH system can be
described through what is referred to as “modeling” algorithms. The essence of
these modeling approaches is that identifi ed concepts are placed into a mathemati-
cal construct that enables the identifi cation of relationships between the concepts.
Relationships are of two general types: (1) direct - where concepts are found to
explicitly occur with each other in a specifi ed context (e.g . , in the same document);
or, (2) indirect - where concepts are related based on inferred relationships that are
based on some logical formalism (e.g . , in the case of Swanson's study, through a
bridging concept that enabled the transitive property to be used to relate otherwise
unrelated concepts). Depending on the particular representation approach used,
various weights can be applied to each relationship. There are a number of ways that
weights can be calculated, including those that are based on direct frequency or
weighted frequency. Direct frequency approaches are based on a simple tabulation
of how often a particular relationship occurs; weighted frequency approaches are
based on tabulation of how often a given relationship occurs, normalized according
to how common the relationship is in the universe of all possible relationships.
Search WWH ::




Custom Search