Extracting Relations from Text: From Word Sequences to Dependency Paths - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

The shortest-path dependency kernels outperform the dependency kernel from

[17] in both scenarios, with a more substantial gain for SP-CFG. An error analy-

sis revealed that Collins' parser was better at capturing local dependencies, hence

the increased accuracy of SP-CFG. Another advantage of shortest-path dependency

kernels is that their training and testing are very fast - this is due to representing

the sentence as a chain of dependencies on which a fast kernel can be computed. All

of the four SP kernels from Table 3.2 take between 2 and 3 hours to train and test

on a 2.6GHz Pentium IV machine.

As expected, the newspaper articles from ACE are less prone to parsing errors

than the biomedical articles from AIMed. Consequently, the extracted dependency

structures are more accurate, leading to an improved accuracy for the dependency

kernel.

To avoid numerical problems, the dependency paths are constrained to pass

through at most 10 words (as observed in the training data) by setting the kernel

to 0 for longer paths. The alternative solution of normalizing the kernel leads to

a slight decrease in accuracy. The fact that longer paths have larger kernel scores

in the unnormalized version does not pose a problem because, by definition, paths

of different lengths correspond to disjoint sets of features. Consequently, the SVM

algorithm will induce lower weights for features occurring in longer paths, resulting

in a linear separator that works irrespective of the size of the dependency paths.

3.5 Future Work

There are cases when words that do not belong to the shortest dependency path do

influence the extraction decision. In Section 3.3.2, we showed how negative polarity

items are integrated in the model through annotations of words along the depen-

dency paths. Modality is another phenomenon that is influencing relation extraction,

and we plan to incorporate it using the same annotation approach.

The two relation extraction methods are very similar: the subsequence patterns

in one kernel correspond to dependency paths in the second kernel. More exactly,

pairs of words from a subsequence pattern correspond to pairs of consecutive words

(i.e., edges) on the dependency path. The lack of dependency information in the

subsequence kernel leads to allowing gaps between words, with the corresponding

exponential penalty factor λ . Given the observed similarity between the two meth-

ods, it seems reasonable to use them both in an integrated model. This model would

use high-confidence head-modifier dependencies, falling back on pairs of words with

gaps, when the dependency information is unreliable.

3.6 Conclusion

Mining knowledge from text documents can benefit from using the structured infor-

mation that comes from entity recognition and relation extraction. However, accu-

rately extracting relationships between relevant entities is dependent on the granu-

larity and reliability of the required linguistic analysis. In this chapter, we presented

two relation extraction kernels that differ in terms of the amount of linguistic infor-

mation they use. Experimental evaluations on two corpora with different types of

discourse show that they compare favorably to previous extraction approaches.

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home