Mining Diagnostic Text Reports by Learning to Annotate Knowledge Roles - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

engineering, and machine learning to implement a learning framework for annotating

cases with knowledge roles. The ultimate goal of the approach is to discover inter-

esting problem solving situations (hereafter simply referred to as cases) that can be

used by an experience management system to support new engineers during their

working activities. However, as an immediate benefit, the annotations facilitate the

retrieval of cases on demand, allow the collection of empirical domain knowledge,

and can be formalized with the help of an ontology to also permit reasoning. The ex-

perimental results presented in the chapter are based on a collection of 500 Microsoft

Word documents written in German, amounting to about one million words. Several

processing steps were required to achieve the goal of case annotation. In particular,

we had to (a) transform the documents into an XML format, (b) extract paragraphs

belonging to cases, (c) perform part-of-speech tagging, (d) perform syntactical pars-

ing, (e) transform the results into XML representation for manual annotation, (f)

construct features for the learning algorithm, and (g) implement an active learning

strategy. Experimental results demonstrate the feasibility of the learning approach

and a high quality of the resulting annotations.

The chapter is organized as follows. In Section 4.2 we describe our domain of

interest, the related collection of documents, and how knowledge roles can be used

to annotate text. In Section 4.3 we consider work in natural language processing,

especially frame semantics and semantic role labeling, emphasizing parallels to our

task and identifying how resources and tools from these domains can be applied to

perform annotation. Section 4.4 describes in detail all the preparatory steps for the

process of learning to annotate cases. Section 4.5 evaluates the results of learning.

Section 4.6 concludes the chapter and outlines areas of future work.

4.2 Domain Knowledge and Knowledge Roles

4.2.1 Domain Knowledge

Our domain of interest is predictive maintenance in the field of power engineering,

more specifically, the maintenance of insulation systems of high-voltage rotating

electrical machines. Since in many domains it is prohibitive to allow faults that could

result in a breakdown of the system, components of the system are periodically

or continuously monitored to look for changes in the expected behavior, in order

to undertake predictive maintenance actions when necessary. Usually, the findings

related to the predictive maintenance process are documented in several forms: the

measured values in a relational database; the evaluations of measurements/tests

in diagnostic reports written in natural language; or the recognized symptoms in

photographs. The focus of the work described here are the textual diagnostic reports.

In the domain of predictive maintenance, two parties are involved: the service

provider (the company that has the know-how to perform diagnostic procedures and

recommend predictive maintenance actions) and the customer (the operator of the

machine). As part of their business agreement, the service provider submits to the

customer an o cial diagnostic report . Such a report follows a predefined structure

template and is written in syntactically correct and parsimonious language. In our

case, the language is German.

A report is organized into many sections: summary, reason for the inspection,

data of the inspected machine, list of performed tests and measurements, evaluations

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home