Information Technology Reference
In-Depth Information
search engine. There is some indication that both IBM and Wolfram are interested
in applying their technologies for biomedicine (especially with respect to providing
more insight into the costs of healthcare, but also for identifying meaningful pat-
terns associated with disease), and there has been some progress within the research
community. The IBM Watson system, in fact, was built largely using the UIMA
framework and is a testament to the ability and potential power of community driven
frameworks. It is also possible to integrate existing NLU systems such as MetaMap
into frameworks such as GATE or UIMA, thus enabling one to leverage well-known
NLU systems with new techniques.
5.2.3
Role of Metadata and Indexing
In addition to knowledge that may be embedded in text, there are additional sources
of information that can be used for knowledge discovery. These come generally in
the form of “metadata”. Metadata are simply defi ned as “data about data.” In a
well-curated system, digital objects are associated with a range of metadata. The
most signifi cant benefi t of indexing initiatives, such as those led by the National
Library of Medicine for indexing MEDLINE [ 32 ], is the generation and applica-
tion of additional metadata for enabling information retrieval systems to meet
information needs. The aforementioned MeSH descriptors that are applied through
a systematic review of content by subject matter experts and librarians enables one
to retrieve citations on a given topic (or combination of topics, or even exclusion of
certain topics) with high reliability. Thus, while indexing does not refl ect every
possible topic, it does provide an accurate high-level aggregation of data objects
according to some systematic process. In the case of using MeSH descriptors for
organizing MEDLINE content, one can navigate a large corpus of biomedical lit-
erature according to more than 27,000 descriptors. Metadata can be generic in form
and function, such as to enable discovery of the objects that are organized into a
collection.
In contemporary context, metadata are applied to data objects in a systematic
manner. A popular metadata format applied to general digital data objects is Dublin
Core (DC) [ 33 ]. DC is designed for describing a wide array of digital objects, such
as those discoverable on the Internet. Within biomedicine, the largest repository of
publicly available biomedical literature (MEDLINE) is associated with nearly 90
metadata types that are formally described in a Document Type Defi nition (DTD)
schema. DTDs are written in a formal syntax (written in XML) that is used to
describe the set of metadata types that can be associated with a particular digital
object. Through DTDs, digital objects become “machine readable,” which promotes
the potential for computational approaches for discovery of new knowledge. Beyond
DC and DTDs, additional metadata standards have emerged that allow for descrip-
tion of scientifi c research objects. One of particular note is the Investigation, Study
and Assay tools (ISA-tools) [ 34 ]. The ISA-tools metadata standard can be used to
organize and publish the artifacts associated with a particular study.
Search WWH ::




Custom Search