Information Technology Reference
In-Depth Information
a) Divide the corpus in clusters of sentences with the same target verb. If a cluster
has fewer sentences than a given threshold, group sentences with verbs evoking
the same frame into the same cluster.
b) Within each cluster, group the sentences (or clauses) with the same parse sub-
tree together.
c) Select sentences from the largest groups of the largest clusters and present them
to the user for annotation.
d) Bootstrap initialization: apply the labels assigned by the user to groups of sen-
tences with the same parse sub-tree.
e) Train all the classifiers of the committee on the labeled instances; apply each
trained classifier to the unlabeled sentences.
f) Get a pool of instances where the classifiers of the committee disagree and
present to the user the instances belonging to sentences from the next largest
clusters not yet manually labeled.
g) Repeat steps d)-f) a few times until a desired accuracy of classification is
achieved.
In the following, the rationale behind choosing these steps is explained.
Steps a), b), c) : In these steps, statistics about the syntactical structure of the
corpus are created, with the intention of capturing its underlying distribution, so
that representative instances for labeling can be selected.
Step d) : This step has been regarded as applicable to our corpus, due to the
nature of the text. Our corpus contains repetitive descriptions of the same diagnos-
tic measurements on electrical machines, and often, even the language used has a
repetitive nature. Actually, this does not mean that the same words are repeated
(although often standard formulations are used, especially in those cases when noth-
ing of value was observed). Rather, the kind of sentences used to describe the task
has the same syntactic structure. As an example, consider the sentences shown in
Figure 4.14.
[ PP Im Nutaustrittsbereich] wurden [ NP starkere Glimmentladungsspuren] festgestellt.
In the area of slot exit stronger signs of corona discharges were detected.
[ PP Bei den Endkeilen] wurde [ NP ein ausreichender Verkeildruck] festgestellt.
At the terminals' end a su cient wedging pressure was detected.
[ PP An der Schleifringbolzenisolation] wurden [ NP mechanische Beschadigungen] festgestellt.
On the insulation of slip rings mechanical damages were detected.
[ PP Im Wickelkopfbereich] wurden [ NP grossflachige Decklackablatterungen] festgestellt.
In the winding head area extensive chippings of the top coating were detected.
Fig. 4.14. Examples of sentences with the same structure.
What all these sentences have in common is the passive form of the verb fest-
stellen (wurden festgestellt), and due to the subcategorization of this verb, the parse
tree on the level of phrases is identical for all sentences, as indicated by 4.15. Fur-
thermore, for the frame Observation evoked by the verb, the assigned roles are in
all cases: NP—Finding, PP—Observed Object. Thus, to bootstrap initialization, we
assign the same roles to sentences with the same sub-tree as the manually labeled
sentences.
Search WWH ::




Custom Search