Information Technology Reference
In-Depth Information
The task of identifying relations between genes is even more complicated because of
the various linguistic forms one can express the relationship. The verbal phrases
which represent the relationship may be nominalized or in passive form. Often several
facts and relationships are concatenated or embedded in one sentence. Consider the
following sentence:
These TD IkappaB mutants almost completely inhibited the induction of monocyte
chemoattractant protein-1, interleukin-8, intercellular adhesion molecule-1, vascular
cell adhesion molecule-1, and E-selectin expression by TNF-alpha, whereas
interferon-gamma-mediated up-regulation of intercellular adhesion molecule-1 and
HLA-DR was not affected [28].
The following biological reactions are expressed:
Interferon-gamma mediates the up-regulation of intercellular adhesion melcule-1
Interferon-gamma mediates the up-regulation of HLA-DR
TD IkappaB mutants do NOT affect 1. and 2.
TD IkappaB mutants inhibit the induction of monocyte chemoattractant protein-1
etc.
From this example it can be seen that complex semantical and syntactical analysis is
needed to extract the relationships described by authors in the biomedical documents.
Proposals in the literature how to handle this task range from simplifying assumptions
to the use of full parsing.
The earlier works in this field concentrated on the task of extracting substance names
and other terms to build dictionaries or ontologies. In recent research projects the
focus shifted to extract information about interactions and relations between
substances. For instance, [29] look for co-occurring gene names and assign those
genes a relation if they co-occur with statistically significant frequency, leaving out
the details of the relation. Much of the work reported so far focuses on extracting
protein-protein interactions. [30] describe a system that extracts protein-protein
interactions from MEDLINE abstracts. After locating the protein names, the system
tries to find out the “actor” (subject) and the “patient” (object) of the proteins and thus
also extracts the direction of the interaction. A very pragmatic approach is given by
[31] with creation of a gene-to-gene co-citation network for 13712 human genes by
automated analysis of titles and abstracts in over 10 million Medline records. [32]
report on the adaptation of the general purpose IE system LaSIE to the biological
domain. The resulting systems PASTA and EmPathIE extract information about
protein structure and enzyme and metabolic pathway information respectively. [33]
extract relations associated with the verbs activate, bind, interact, regulate, encode,
signal , and function . The system from [34] only extracts protein interactions
associated with the verb phrases interact with , associate with , and bind to . Another
interesting approach is that of [35 ]. They report on the system GeneScene in which
they use preposition-based templates combined with a word classification using
WordNet 1.6. The average precision is 70 %. However, the method has some
potential for improvement and moreover it is not restricted to proteins or genes as the
agents and the verb phrases describing the interaction need not be pre-specified. [36]
report on preliminary results on using a full parser for the extraction of events from
the biomedical literature. An event can be viewed as activity or occurrence of interest
e.g. a biological reaction. This task is quite more complex than the extraction of
interactions because it identifies the dependencies or sequences of events.
Search WWH ::




Custom Search