Information Technology Reference
In-Depth Information
12.4.2 Learning Renderings for Semantic Relations
Whilst, with PoeTryMe, we can write our own grammar rules manually, this may
take too much time and have an unbalanced coverage. We thus automated this task
by exploiting human-created Portuguese poetry.
It is a well-known fact that semantic relations can be expressed in running text by
discriminating patterns, typically used to discover new relations (see e.g. Hearst [ 19 ]).
Therefore, we discovered line patterns automatically, by exploiting available poetry.
The discovering process consisted of the following steps:
1. Collect and tokenize all the lines of the exploited poems;
2. For each relation instance t
in CARTÃO, select all lines (or pairs
of lines) containing both words w 1 and w 2;
3. For each original line selected, replace w 1 and w 2 respectively with the terminal
tokens indicating the first and second relation argument (
= (
w 1
,
r
,
w 2
)
).
4. Add each sentence as a grammar rule, whose name is the same as the relation
predicate r . This rule can be seen as a generic rendering for relations of type r .
A total of 4,107 renderings were discovered, after exploiting the following textual
collections:
<
arg 1
>
and
<
arg 2
>
Poems in Versos de Segunda , a web portal dedicated to Portuguese poetry. 5 These
included mostly classical forms of poetry, especially sonnets and other poems that
followed a strict metre, rhythm and rhyme pattern.
Portuguese song lyrics, transcribed in the scope of project Natura . 6 As lyrics tend
to follow the rhythm of the song, these poems tend to have a higher degree of
freedom, concerning their form, as compared to strict forms of poetry.
Table 12.1 shows examples of the relations used, example arguments, and auto-
matically discovered patterns, used as renderings for the relations. We included rough
translations of the patterns.
We should add that, in order to deal with inflected nouns and to keep number
and gender agreement in the generated sentences, before discovering the patterns,
we added the number and the gender of the noun and adjective arguments to the
relation predicate name. For example, the instance { destino synonym-of futuro }was
changed to { destino ms-synonym-of-ms futuro } (bothmasculine, singular), while the
instance { versos part-of quadras } was changed to { versos mp-part-of-fp quadras }
(masculine, plural and feminine, plural). For the sake of clarity, this information
was not included in Table 12.1 . We recall that the number and gender information
was obtained from LABEL-LEX. An alternative for this procedure, would be to
associate inflection information to the argument tokens, similarly towhat Agirrezabal
et al. [ 1 ] do.
5 Versos de Segunda is hosted at http://users.isr.ist.utl.pt/~cfb/VdS/zlista.html (as of December
2013).
6 Project Natura is hosted at http://natura.di.uminho.pt/~jj/musica/lista_transcricoes.html (as of
December 2013).
Search WWH ::




Custom Search