Information Technology Reference
In-Depth Information
S
PP
Fig. 4.15. Parse tree of the sentences in Figure 4.14.
VAFIN
NP
VP
Step e) : The committee of classifiers consists of a maximum entropy (MaxEnt)
classifier from Mallet [19], a Winnow classifier from SNoW [2], and a memory-based
learner (MBL) from TiMBL [6]. For the MBL, we selected k=5 as the number of the
nearest neighbours. The classification is performed as follows: if at least two classi-
fiers agree on a label, the label is accepted. If there is disagreement, the cluster of
labels from the five nearest neighbours is examined. If the cluster is not homogenous
(i.e., it contains different labels), the instance is included in the set of instances to
be presented to the user for manual labeling.
Step f) : If one selects new sentences for manual annotation only based on the
output of the committee-based classifier, the risk of selecting outlier sentences is
high [29]. Thus, from the instances' set created by the classifier, we select those
belonging to large clusters not manually labeled yet.
4.5 Evaluations
To evaluate this active learning approach on the task of annotating text with knowl-
edge roles, we performed a series of experiments that are described in the following.
It was explained in Section 4.4.1 that, based on the XML structure of the docu-
ments, we created subcorpora with text belonging to different types of diagnostic
tests. After such subcorpora have been processed to create sentences, only unique
sentences are retained for further processing (repetitive, standard sentences do not
bring any new information, they only disturb the learning and therefore are dis-
carded). Then, lists of verbs were created, and by consulting the sources mentioned
in Section 4.3.3, verbs were grouped with one of the frames: Observation, Evidence,
Activity, and Change. Other verbs that did not belong to any of these frames were
not considered for role labeling.
4.5.1 Learning Performance on the Benchmark Datasets
With the aim of exploring the corpus to identify roles for the frames and by using
our learning framework, we annotated two different subcorpora and then manually
controlled them, to create benchmark datasets for evaluation. Some statistics for
the manually annotated subcorpora are summarized in Table 4.4. Then, to evaluate
the e ciency of the classification, we performed 10-fold cross-validations on each
set, obtaining the results shown in Table 4.5, where recall, precision, and the F β =1
measure are the standard metrics of information retrieval.
We analyzed some of the classification errors and found that they were due to
parsing anomalies, which had forced us in several occasions to split a role among
several constituents.
Search WWH ::




Custom Search