Summarizing Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

knowledge about the domain and which types of events are often significant in that domain. For

example, the summarizer could detect that there are many sentences in a meeting or email thread

that concern a particular action item, and the awareness that action items often form a critical part

of a summary might be part of the knowledge base available to the summarizer.

knowledge

base

More specifically, domain knowledge can be represented in an ontology . One popular ontology

ontology

language is OWL/RDF, widely used in semantic web contexts and based on description logics, a

subset of first order logic. An ontology typically contains a class-subclass hierarchy, properties or

relations, and instance data. For example, we may have a class Person and a subclass Manager , and

a particular instance of a manager Heather . We may have a property or relation worksWith that

connects instances of Person . Adding instance data to the ontology is called populating the ontology .

To use the language of ontological engineering, our classes and properties are defined in the T-Box

of the ontology, while the A-Box contains our instance data. We do not go into any more detail on

ontologies here, but any primer on the semantic web should suffice to give a general overview (e.g.,

[ Allemang and Hendler , 2008 ], [ Segaran et al. , 2009 ]).

An abstractive system also requires natural language generation (NLG) to create the summary

output. An NLG system is typically comprised of a planner to create the document structure, a micro-

planner to refine the document plan by doing aggregation and coreference resolution among other

tasks, and a realizer to generate the actual surface text. Reiter and Dale [ 2000 ] provide the classic

text on NLG systems and components.

4.2.3 OUTPUTS AND INTERFACES

Although research on automatic summarization usually concerns the generation of textual sum-

maries, summaries need not be text-based. A meeting summary could consist of concatenated audio

clips from the discussion, while any conversation could be summarized with a graphics-based visu-

alization highlighting information such as participant activity and topic dispersion.

visualiza-

tion

Even with textual summaries, many types of output are possible. One could generate well-

formed paragraphs of coherent text describing the conversation at a high level—a so-called abstractive

summary necessitating a text generation component. A simpler approach, but one that leads to less

coherent summaries, is the extractive approach of simply classifying some sentences as important

and pasting them together. One could also generate a word cloud or a list of dates, named entities or

word

cloud

keywords. A word cloud might not seem like a summary in the traditional sense, but it does fit the

definition of condensing a document to a simple representation of its most important components.

For example, Figure 4.1 shows a word cloud representing the email conversation shown earlier.

Summary outputs and interfaces also vary according to how a summary is intended to be used.

If a summary is meant to serve as an index to browsing the original document, then it might be

situated in a browsing interface along with a variety of other search and browsing functions. Indeed, a

browing interface could feature multi-modal summary types such as an abstractive textual summary

alongside a word cloud and some visualizations of participant activity. All of these summary types

would then be linked to the original conversation record and possibly to each other.

Methods for Mining and Summarizing Text Conversations

Search WWH ::

Custom Search

Home