Information Technology Reference
In-Depth Information
Table 4.1. Some phrases annotated with the role Cause .
German Phrase
English Translation
Frequency
Verschmutzungseinflusse
influences of pollution
10
leitende Verschmutzungen
conducting pollutions
8
Ionisation in Klemmenbereich ionization in the terminal area
3
aussere Entladungen
external discharges
1
If for every sentence with the frame Evidence the text annotated with Symptom
and Cause is extracted, this text can then be processed further with other text
mining techniques for deriving domain knowledge, which is not directly available in
any of the analyzed texts. For example, one could get answers to questions like: which
are the most frequent symptoms and what causes can explain them; what problems
(i.e., causes) do appear frequently in a specific type of machine, etc. Thus, such an
annotation with frames and roles preprocesses text by generating very informative
data for text mining, and it can also be used in the original form for information
retrieval. Still, such an approach makes sense in those cases when text contains
descriptions of repetitive tasks, which are then expressed by a small number of
underlying semantic frames. Since data and text mining try to extract knowledge
from data of the same nature in the same domain, we find that annotation of text
with knowledge roles could be a valuable approach.
Before explaining in detail the process of learning to automatically annotate text
with knowledge roles (based on the SRL task) in Section 4.4, we briefly discuss the
related field of information extraction.
4.3.4 Information Extraction
Information extraction (IE), often regarded as a restricted form of natural language
understanding, predates research in text mining, although today, IE is seen as one of
the techniques contributing to text mining [30]. Actually, the purpose of IE is very
similar to what we are trying to achieve with role annotation. In IE it is usually
known in advance what information is needed, and part of text is extracted to fill
in slots of a predefined template. An example, found in [20], is the job posting
template, where, from job posting announcements in Usenet, text to fill slots like:
title, state, city, language, platform, etc. is extracted and stored in a database for
simpler querying and retrieval.
Usually, methods used by IE have been based on shallow NLP techniques, trying
to extract from a corpus different types of syntactic rules that match syntactic roles
to semantic categories, as for example in [23].
With the advances in NLP and machine learning research, IE methods have
also become more sophisticated. Actually, SRL can also be seen as a technology for
performing information extraction, in those cases when text is syntactically and se-
mantically more demanding and expressive. All these technologies are intended to be
used for extracting knowledge from text, despite their differences in implementation
or scope.
 
Search WWH ::




Custom Search