Mining Diagnostic Text Reports by Learning to Annotate Knowledge Roles - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

Table 4.1. Some phrases annotated with the role Cause .

German Phrase

English Translation

Frequency

Verschmutzungseinflusse

influences of pollution

10

leitende Verschmutzungen

conducting pollutions

8

Ionisation in Klemmenbereich ionization in the terminal area

3

aussere Entladungen

external discharges

1

If for every sentence with the frame Evidence the text annotated with Symptom

and Cause is extracted, this text can then be processed further with other text

mining techniques for deriving domain knowledge, which is not directly available in

any of the analyzed texts. For example, one could get answers to questions like: which

are the most frequent symptoms and what causes can explain them; what problems

(i.e., causes) do appear frequently in a specific type of machine, etc. Thus, such an

annotation with frames and roles preprocesses text by generating very informative

data for text mining, and it can also be used in the original form for information

retrieval. Still, such an approach makes sense in those cases when text contains

descriptions of repetitive tasks, which are then expressed by a small number of

underlying semantic frames. Since data and text mining try to extract knowledge

from data of the same nature in the same domain, we find that annotation of text

with knowledge roles could be a valuable approach.

Before explaining in detail the process of learning to automatically annotate text

with knowledge roles (based on the SRL task) in Section 4.4, we briefly discuss the

related field of information extraction.

4.3.4 Information Extraction

Information extraction (IE), often regarded as a restricted form of natural language

understanding, predates research in text mining, although today, IE is seen as one of

the techniques contributing to text mining [30]. Actually, the purpose of IE is very

similar to what we are trying to achieve with role annotation. In IE it is usually

known in advance what information is needed, and part of text is extracted to fill

in slots of a predefined template. An example, found in [20], is the job posting

template, where, from job posting announcements in Usenet, text to fill slots like:

title, state, city, language, platform, etc. is extracted and stored in a database for

simpler querying and retrieval.

Usually, methods used by IE have been based on shallow NLP techniques, trying

to extract from a corpus different types of syntactic rules that match syntactic roles

to semantic categories, as for example in [23].

With the advances in NLP and machine learning research, IE methods have

also become more sophisticated. Actually, SRL can also be seen as a technology for

performing information extraction, in those cases when text is syntactically and se-

mantically more demanding and expressive. All these technologies are intended to be

used for extracting knowledge from text, despite their differences in implementation

or scope.

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home