Databases Reference
In-Depth Information
CHAPTER
3
Mining Text Conversations
3.1
INTRODUCTION
In this chapter, we describe several text mining techniques that can be applied to conversations
and explain why they might be useful for other tasks such as summarization. We can see from the
example conversation in Chapter 1 that there are numerous questions a person might ask if they
wanted quickly to understand the discussion: What was the topic of discussion? , What was proposed? ,
What opinions were expressed? , What was finally decided? The mining techniques described in this
chapter attempt to answer these and other questions.
We first discuss topic modeling, comprising the two related tasks of topic segmentation and
topic labeling. We then describe the broad field of sentiment and subjectivity detection and several
specific sentiment tasks. After that, we cover tasks related to mining the conversation structure,
including dialogue act classification, decision and action item detection, and extraction of thread
structure. In each section, we give examples of current work on conversational data. We conclude
the chapter by giving pointers to further reading on each area of interest.
3.2
TOPIC MODELING: TOPIC SEGMENTATION AND
TOPIC LABELING
Any document spanning more than a few sentences is very likely to cover more than one topic. For
instance, Hearst [ 1997 ] reports that in her corpus of expository text the end of each paragraph has
approximately a 40% chance of being a topic boundary.
The task of topic modeling aims to capture the topical structure of a document (or a collection
of documents) by identifying what topics are discussed in the text, and which portions of text corre-
spond to which topics. When the goal is limited to splitting the input document(s) into segments,
where each segment is about a single topic, we talk about topic segmentation . In contrast, complete
topic seg-
mentation
topic modeling includes both topic segmentation and topic labeling , in which all the topics covered in
the input document(s) are labeled with informative descriptions, ranging from simple sets of words
to more informative phrases.
topic label-
ing
As an example, Table 3.1 shows a possible multi-paragraph topic model for a 23-paragraph
article about the exploration of Venus by the Magellan space probe 1 .
Topic models can be flat or hierarchical. In a flat topic model, text is modeled as a sequence of
hierarchical
topics
topical segments with no further decomposition, while in a hierarchical topic model segments can be
1 Source: http://people .ischool .berkeley.edu/˜hearst/research/tiling.html
Search WWH ::




Custom Search