Introduction - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

1.3 RELATED TOPICS AND BACKGROUND READINGS

While the focus of this topic is on summarizing text conversations, there has been considerable work

recently on speech summarization. This includes cases where either textual features are supple-

mented by speech features such as prosody 13 extracted directly from the speech signal, or else textual

transcription is bypassed altogether in order to create speech-to-speech summaries. A forthcoming

Synthesis Lecture on Speech Summarization [ Penn and Zhu , Forthcoming ] provides a comprehen-

sive introduction to such methods and represents a nice complement to this one. Another Synthesis

Lecture that is very relevant to ours is by Agarwal and Liu [ 2009 ]. However, while our focus is

mainly on dealing with a single text conversation at a time, they explore approaches for modeling

and mining huge collections of intertwined conversations. More specifically, they focus on the space

of all blogs that constitutes the blogosphere, and discuss tools for clustering blog conversations,

extract communities and identify influential bloggers within a community.

In our exploration of methods and tools for mining and summarizing text conversations, we

will often refer to general-purpose techniques for processing and visualizing text. Although we will

always try to provide the necessary background, the interested reader can refer to the following

publications for a more comprehensive treatment of the different subjects. The leading introduction

to the field of Natural Language Processing (NLP) is Jurafsky and Martin [ 2008 ], which covers,

among others, basic techniques for information extraction, text segmentation and text summariza-

tion. Most of the methods presented in this topic rely on Machine Learning techniques that have

become increasingly popular in NLP in the last decade. All kinds of learning paradigms have been ap-

plied to mine and summarize conversations, including supervised, unsupervised and semi-supervised

ones. For an introduction to machine learning (ML), see Poole and Mackworth [ 2010 ]. For a more

comprehensive treatment of the subject, refer to Murphy [ expected, Spring 2012 ]. Semi-supervised

methods are described in Zhu and Goldberg [ 2009 ]. Opinion mining from text has received a great

deal of attention in recent years. Pang and Lee [ 2008 ] provide an up-to-date survey of the field. To

the best of our knowledge there is no topic completely devoted to Information Visualization for text

analysis, however a concise introduction to the field is provided by Hearst [ 2009 ](Chapter 11).

1.4 MINING AND SUMMARIZING TEXT CONVERSATIONS:

AN OVERVIEW

The sample application scenarios we described in Section 1.2 (and many others yet to be explored)

require powerful computational methods to mine and summarize text conversations. Although the

details of the specific methods will be discussed in later chapters, here we overview the basic intuitions

and principles at the core of these methods. Key definitions and illustrative examples will also be

provided. As a running example, we will refer to the sample synthetic email conversation shown in

Figure 1.4 , which involves three participants and seven email messages.

13 Prosody refers to properties of the acoustic signal associated with an utterance. These include rhythm, stress, and intonation of

speech. Prosody has many pragmatic functions. For instance, in many languages, speakers use prosody to convey irony or surprise,

to signal emphasis or contrast, and to ask a question.

Search WWH ::

Custom Search

Home