Information Technology Reference
In-Depth Information
[SPK] : This is the shortest path dependency kernel, using the head-modifier
dependencies extracted by Collins' syntactic parser. The kernel is trained and tested
on the same 10 splits as ELCS and SSK.
The Precision-Recall curves that show the trade-off between these metrics are
obtained by varying a threshold on the minimum acceptable extraction confidence,
based on the probability estimates from LibSVM. The results, summarized in Fig-
ure 3.7, show that the subsequence kernel outperforms the other three systems, with
a substantial gain. The syntactic parser, which is originally trained on a newspaper
corpus, builds less accurate dependency structures for the biomedical text. This is
reflected in a significantly reduced accuracy for the dependency kernel.
100
SSK
Manual
ELCS
SPK
90
80
70
60
50
40
30
20
10
0
0
10
20
30
40
50
60
70
80
90
100
Recall (%)
Fig. 3.7. Precision-Recall curves for protein interaction extractors.
3.4.2 Relation Extraction from ACE
The two kernels are also evaluated on the task of extracting top-level relations
from the ACE corpus [12], the version used for the September 2002 evaluation.
The training part of this dataset consists of 422 documents, with a separate set of
97 documents reserved for testing. This version of the ACE corpus contains three
types of annotations: coreference, named entities and relations. There are five types
of entities - Person, Organization, Facility, Location, and Geo-Political
Entity - which can participate in five general, top-level relations: Role, Part,
Located, Near, and Social. In total, there are 7,646 intra-sentential relations, of
which 6,156 are in the training data and 1,490 in the test data.
Search WWH ::




Custom Search