Information Technology Reference
In-Depth Information
The shortest-path dependency kernels outperform the dependency kernel from
[17] in both scenarios, with a more substantial gain for SP-CFG. An error analy-
sis revealed that Collins' parser was better at capturing local dependencies, hence
the increased accuracy of SP-CFG. Another advantage of shortest-path dependency
kernels is that their training and testing are very fast - this is due to representing
the sentence as a chain of dependencies on which a fast kernel can be computed. All
of the four SP kernels from Table 3.2 take between 2 and 3 hours to train and test
on a 2.6GHz Pentium IV machine.
As expected, the newspaper articles from ACE are less prone to parsing errors
than the biomedical articles from AIMed. Consequently, the extracted dependency
structures are more accurate, leading to an improved accuracy for the dependency
kernel.
To avoid numerical problems, the dependency paths are constrained to pass
through at most 10 words (as observed in the training data) by setting the kernel
to 0 for longer paths. The alternative solution of normalizing the kernel leads to
a slight decrease in accuracy. The fact that longer paths have larger kernel scores
in the unnormalized version does not pose a problem because, by definition, paths
of different lengths correspond to disjoint sets of features. Consequently, the SVM
algorithm will induce lower weights for features occurring in longer paths, resulting
in a linear separator that works irrespective of the size of the dependency paths.
3.5 Future Work
There are cases when words that do not belong to the shortest dependency path do
influence the extraction decision. In Section 3.3.2, we showed how negative polarity
items are integrated in the model through annotations of words along the depen-
dency paths. Modality is another phenomenon that is influencing relation extraction,
and we plan to incorporate it using the same annotation approach.
The two relation extraction methods are very similar: the subsequence patterns
in one kernel correspond to dependency paths in the second kernel. More exactly,
pairs of words from a subsequence pattern correspond to pairs of consecutive words
(i.e., edges) on the dependency path. The lack of dependency information in the
subsequence kernel leads to allowing gaps between words, with the corresponding
exponential penalty factor λ . Given the observed similarity between the two meth-
ods, it seems reasonable to use them both in an integrated model. This model would
use high-confidence head-modifier dependencies, falling back on pairs of words with
gaps, when the dependency information is unreliable.
3.6 Conclusion
Mining knowledge from text documents can benefit from using the structured infor-
mation that comes from entity recognition and relation extraction. However, accu-
rately extracting relationships between relevant entities is dependent on the granu-
larity and reliability of the required linguistic analysis. In this chapter, we presented
two relation extraction kernels that differ in terms of the amount of linguistic infor-
mation they use. Experimental evaluations on two corpora with different types of
discourse show that they compare favorably to previous extraction approaches.
Search WWH ::




Custom Search