Information Technology Reference
In-Depth Information
4.4.4 Tree Representation
The bracketed parse tree and the stem information of tagging serve as input for
the step of creating a tree data structure. The tree is composed of terminals (leaf
nodes) and non-terminals (internal nodes), all of them known as constituents of the
tree. For export purposes as well as for performing exploration or annotation of the
corpus, the tree data structures are stored in XML format, according to a schema
defined in the TigerSearch 10 tool. The created tree, when visualized in TigerSearch,
looks like the one shown in Figure. 4.10. 11 The terminals are labeled with their
POS tags and also contain the corresponding words and stems; the inside nodes are
labeled with their phrase types (NP, PP, etc.); and the branches have labels, too,
corresponding to the grammatical functions of the nodes. The XML representation
of a portion of the tree is shown in Figure 4.11.
OA
NP
NK
RC
S
SB
MO
MO
HD
PP
AD
AD
NK
NK
NK
AP
PP
NG
HD
DA
NK
Unregelmässigkeiten
,
Spannugssteuerung
Wickelkopfbereich
die
auf
eine
nicht
mehr
kontinuierliche
im
hindeuten
NN
$,
,
PRELS
APPR
ART
PTKNEG
PIAT
ADJA
NN
APPRART
NN
WFIN
Unregelmässigkeit
d
auf
ein
nicht
mehr
kontinuierlich
Spannung−Steuerung
in
Wickel−Kopf−Bereich
hindeuten
Fig. 4.10. Representation of a parsed tree in the TigerSearch tool. Due to space
reasons, only a branch of the tree is shown.
4.4.5 Feature Creation
Features are created from the parse tree of a sentence. A feature vector is created
for every constituent of the tree, containing some features unique to the constituent,
some features common to all constituents of the sentence, and some others calculated
with respect to the target constituent (the predicate verb).
A detailed linguistic description of possible features used by different research
systems for the SRL task is found in [22]. In this subsection, we only list the features
used in our system and give example values for the leaf node Spannungssteuerung
of the parse tree in Figure 4.10.
10 http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/
11 English translation: “. . . irregularities, which point to a not anymore continuous
steering of voltage in the area of the winding head.”
Search WWH ::




Custom Search