Information Technology Reference
In-Depth Information
explicitly creating for each sentence a vector with a position for each such feature is
infeasible, due to the high dimensionality of the feature space. Here, we exploit dual
learning algorithms that process examples only via computing their dot-products,
such as in Support Vector Machines (SVMs) [10, 11]. An SVM learner tries to find
a hyperplane that separates positive from negative examples and at the same time
maximizes the separation (margin) between them. This type of max-margin sepa-
rator has been shown both theoretically and empirically to resist overfitting and to
provide good generalization performance on unseen examples.
Computing the dot-product (i.e., the kernel) between the features vectors asso-
ciated with two relation examples amounts to calculating the number of common
anchored subsequences between the two sentences. This is done e ciently by modify-
ing the dynamic programming algorithm used in the string kernel from [2] to account
only for common sparse subsequences constrained to contain the two protein-name
tokens. The feature space is further prunned down by utilizing the following prop-
erty of natural language statements: when a sentence asserts a relationship between
two entity mentions, it generally does this using one of the following four patterns:
[FB] F ore- B etween: words before and between the two entity mentions are
simultaneously used to express the relationship. Examples: 'interaction of P 1 with
P 2 ,' 'activation of P 1 by P 2 .'
[B] B etween: only words between the two entities are essential for asserting
the relationship. Examples: ' P 1 interacts with P 2 ,' ' P 1 is activated by P 2 .'
[BA] B etween- A fter: words between and after the two entity mentions are
simultaneously used to express the relationship. Examples: ' P 1 - P 2 complex,'
' P 1 and P 2 interact.'
[M] M odifier: the two entity mentions have no words between them. Examples:
U.S. troops (a Role:Staff relation), Serbian general (Role:Citizen).
While the first three patterns are su cient to capture most cases of interactions
between proteins, the last pattern is needed to account for various relationships ex-
pressed through noun-noun or adjective-noun compounds in the newspaper corpora.
Another observation is that all these patterns use at most four words to express
the relationship (not counting the two entity names). Consequently, when computing
the relation kernel, we restrict the counting of common anchored subsequences only
to those having one of the four types described above, with a maximum word-length
of four. This type of feature selection leads not only to a faster kernel computation,
but also to less overfitting, which results in increased accuracy.
The patterns enumerated above are completely lexicalized and consequently their
performance is limited by data sparsity. This can be alleviated by categorizing words
into classes with varying degrees of generality, and then allowing patterns to use both
words and their classes. Examples of word classes are POS tags and generalizations
over POS tags such as Noun, Active Verb, or Passive Verb. The entity type can
also be used if the word is part of a known named entity. Also, if the sentence is
segmented into syntactic chunks such as noun phrases (NP) or verb phrases (VP),
the system may choose to consider only the head word from each chunk, together
with the type of the chunk as another word class. Content words such as nouns and
verbs can also be related to their synsets via WordNet. Patterns then will consist
of sparse subsequences of words, POS tags, generalized POS tags, entity and chunk
types, or WordNet synsets. For example, 'Noun of P 1 by
P 2 ' is an FB pattern
based on words and general POS tags.
Search WWH ::




Custom Search