Information Technology Reference
In-Depth Information
case for leveraging templates to systematically enhance otherwise unstructured
documents with structure to support subsequent uses for the document (e.g . , for
knowledge discovery).
5.2.2
Natural Language Understanding
Appreciating that there will be likely be a signifi cant component of biomedical lit-
erature that continues to be represented in narrative form, there will be a continued
demand for the development and use of computational approaches to identify poten-
tial embedded knowledge. The sheer volume of biomedical literature that needs to
be analyzed will perpetually necessitate the use of computational approaches. As
mentioned earlier, MEDLINE consists of more than 20 million citations. Even more
impressive is the growth rate of MEDLINE - currently exceeding 1.5 million arti-
cles a year, up from 500,000 articles a year less than a decade ago. It is not incon-
ceivable, that with the growth of biomedical data generation, that the interpretations
that are embodied into biomedical literature will result in continued growth of
annual MEDLINE entries. The sheer volume of text represents a challenge that will
increasingly depend on automated approaches for the elicitation of embodied
knowledge that might be sequestered in text form.
Natural language processing systems are built around algorithms to mediate
between unstructured data and human understanding [ 18 ]. Natural language pro-
cessing systems are of two fl avors: (1) Natural Language Understanding (NLU);
and, (2) Natural Language Generation (NLG). Both types of systems are rife with
challenges. The combination of NLU and NLG systems in fact embody the ultimate
Turing test - where the human is able to directly communicate with the computer in
natural language without the human being able to detect that it is not interacting
with a computer. For the present discussion, we will focus the discussion on NLU
systems, since they focus on extracting information from unstructured data such as
embodied in biomedical literature.
NLU systems are generally built on a combination of linguistic heuristics that
approximate human interpretation of concept recognition, grammar, and ultimately
meaning connoted from text. At a high-level, there are three major aspects of NLU:
(1) Lexical Analysis - identifi cation of named concepts that can be matched to a
dictionary of terms; (2) Syntactic Analysis - identifi cation of syntax used to encode
grammar in context of identifi ed terms; and, (3) Semantic Analysis - identifi cation
of concepts represented by identifi ed terms. NLU systems have been developed that
focus on either one or a combination of these major areas. The inherent variety that
is afforded through the power of natural language is also what continues to support
the need for advanced research in the development of NLU systems.
The challenges faced by NLU systems not withstanding, the potential to leverage
automated routines for extracting information from volumes of text addresses a key
issue in the leveraging of potentially available knowledge. The recent exposition of
artifi cial intelligence supported by NLU systems is the Watson system developed by
Search WWH ::




Custom Search