Information Technology Reference
In-Depth Information
4.5.3 Bootstrapping Based on Other Sets
During the annotation of the two benchmark datasets, we noticed that the two sub-
corpora, although different in nature (set 1: Isolation Current contains evaluations
of numerical measurements performed on the three phases of the machine, set 2:
Wedging System describes visual inspections on the wedging components of the ma-
chine) had very often the frame Observation or Change in common, while the frame
Evidence appeared almost only in the first set, and the frame Activity almost always
in the second. Thus, we tested whether text annotated with the same roles in one
set could bootstrap the learning in the second, and the results are summarized in
Table 4.8.
Table 4.8. Results for bootstrapping based on other labeled sets
Training File
Testing File
Recall Precision
Isolation Current Wedging System
0.765
0.859
Wedging System
Isolation Current
0.642
0.737
We consider these results as very promising, since they hint at the possibility
of using previously annotated text from other subcorpora to bootstrap the learning
process, something that would alleviate the process of acquiring manual annotations
for new text.
4.6 Conclusions
In this chapter, we have presented an approach for extracting knowledge from text
documents containing descriptions of knowledge tasks in a technical domain. Knowl-
edge extraction in our approach is based on the annotation of text with knowledge
roles (a concept originating in knowledge engineering), which we map to seman-
tic roles found in frame semantics. The framework implemented for this purpose is
based on deep NLP and active learning. Experiments have demonstrated a robust
learning performance, and the obtained annotations were of high quality. Since our
framework is inspired by and founded upon research in semantic role labeling (SRL),
the results indicate that SRL could become a highly valuable processing step for text
mining tasks.
In future work, we will consider the advantages of representing annotated text by
means of knowledge roles and the related frames. Besides the previously explained
uses for semantic retrieval of cases and the extraction of empirical domain knowledge
facts, such a representation could also permit looking for potentially interesting
relations in text and can be exploited to populate application and domain ontologies
with lexical items.
 
Search WWH ::




Custom Search