Information Technology Reference
In-Depth Information
is trained based on the subsequence kernel from [2]. This kernel is further gen-
eralized so that words can be replaced with word classes, thus enabling the use
of information coming from POS tagging, named entity recognition, chunking,
or Wordnet [3].
2. In the second approach (Section 3.3), the representation is centered on the short-
est dependency path between the two entities in the dependency graph of the
sentence. Because syntactic analysis is essential in this method, its applicability
is limited to domains where syntactic parsing gives reasonable accuracy.
Entity recognition, a prerequisite for relation extraction, is usually cast as a sequence
tagging problem, in which words are tagged as being either outside any entity, or
inside a particular type of entity. Most approaches to entity tagging are therefore
based on probabilistic models for labeling sequences, such as Hidden Markov Mod-
els [4], Maximum Entropy Markov Models [5], or Conditional Random Fields [6],
and obtain a reasonably high accuracy. In the two information extraction methods
presented in this chapter, we assume that the entity recognition task was done and
focus only on the relation extraction part.
3.2 Subsequence Kernels for Relation Extraction
One of the first approaches to extracting interactions between proteins from biomed-
ical abstracts is that of Blaschke et al. , described in [7, 8]. Their system is based on
a set of manually developed rules, where each rule (or frame) is a sequence of words
(or POS tags) and two protein-name tokens. Between every two adjacent words is a
number indicating the maximum number of intervening words allowed when match-
ing the rule to a sentence. An example rule is “ interaction of (3) <P> (3) with (3)
<P> ”, where ' < P > ' is used to denote a protein name. A sentence matches the rule
if and only if it satisfies the word constraints in the given order and respects the
respective word gaps.
In [9] the authors described a new method ELCS (Extraction using Longest
Common Subsequences) that automatically learns such rules. ELCS' rule represen-
tation is similar to that in [7, 8], except that it currently does not use POS tags,
but allows disjunctions of words. An example rule learned by this system is “ -(7)
interaction (0) [between | of] (5) <P> (9) <P> (17) . ” Words in square brackets
separated by ' | ' indicate disjunctive lexical constraints, i.e., one of the given words
must match the sentence at that position. The numbers in parentheses between ad-
jacent constraints indicate the maximum number of unconstrained words allowed
between the two.
3.2.1 Capturing Relation Patterns with a String Kernel
Both Blaschke and ELCS do relation extraction based on a limited set of match-
ing rules, where a rule is simply a sparse (gappy) subsequence of words or POS
tags anchored on the two protein-name tokens. Therefore, the two methods share
a common limitation: either through manual selection (Blaschke), or as a result of
a greedy learning procedure (ELCS), they end up using only a subset of all pos-
sible anchored sparse subsequences. Ideally, all such anchored sparse subsequences
would be used as features, with weights reflecting their relative accuracy. However,
Search WWH ::




Custom Search