Databases Reference
In-Depth Information
1.3 RELATED TOPICS AND BACKGROUND READINGS
While the focus of this topic is on summarizing text conversations, there has been considerable work
recently on speech summarization. This includes cases where either textual features are supple-
mented by speech features such as prosody 13 extracted directly from the speech signal, or else textual
transcription is bypassed altogether in order to create speech-to-speech summaries. A forthcoming
Synthesis Lecture on Speech Summarization [ Penn and Zhu , Forthcoming ] provides a comprehen-
sive introduction to such methods and represents a nice complement to this one. Another Synthesis
Lecture that is very relevant to ours is by Agarwal and Liu [ 2009 ]. However, while our focus is
mainly on dealing with a single text conversation at a time, they explore approaches for modeling
and mining huge collections of intertwined conversations. More specifically, they focus on the space
of all blogs that constitutes the blogosphere, and discuss tools for clustering blog conversations,
extract communities and identify influential bloggers within a community.
In our exploration of methods and tools for mining and summarizing text conversations, we
will often refer to general-purpose techniques for processing and visualizing text. Although we will
always try to provide the necessary background, the interested reader can refer to the following
publications for a more comprehensive treatment of the different subjects. The leading introduction
to the field of Natural Language Processing (NLP) is Jurafsky and Martin [ 2008 ], which covers,
among others, basic techniques for information extraction, text segmentation and text summariza-
tion. Most of the methods presented in this topic rely on Machine Learning techniques that have
become increasingly popular in NLP in the last decade. All kinds of learning paradigms have been ap-
plied to mine and summarize conversations, including supervised, unsupervised and semi-supervised
ones. For an introduction to machine learning (ML), see Poole and Mackworth [ 2010 ]. For a more
comprehensive treatment of the subject, refer to Murphy [ expected, Spring 2012 ]. Semi-supervised
methods are described in Zhu and Goldberg [ 2009 ]. Opinion mining from text has received a great
deal of attention in recent years. Pang and Lee [ 2008 ] provide an up-to-date survey of the field. To
the best of our knowledge there is no topic completely devoted to Information Visualization for text
analysis, however a concise introduction to the field is provided by Hearst [ 2009 ](Chapter 11).
1.4 MINING AND SUMMARIZING TEXT CONVERSATIONS:
AN OVERVIEW
The sample application scenarios we described in Section 1.2 (and many others yet to be explored)
require powerful computational methods to mine and summarize text conversations. Although the
details of the specific methods will be discussed in later chapters, here we overview the basic intuitions
and principles at the core of these methods. Key definitions and illustrative examples will also be
provided. As a running example, we will refer to the sample synthetic email conversation shown in
Figure 1.4 , which involves three participants and seven email messages.
13 Prosody refers to properties of the acoustic signal associated with an utterance. These include rhythm, stress, and intonation of
speech. Prosody has many pragmatic functions. For instance, in many languages, speakers use prosody to convey irony or surprise,
to signal emphasis or contrast, and to ask a question.
Search WWH ::




Custom Search