Introduction - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

For illustration, Figure 1.6 shows two abstractive summaries of our sample email conversation,

while Figure 1.7 shows one extractive summary of the same conversation. Notice how the level of

abstraction in abstractive summarization can vary considerably, with the first abstractive summary

being much more abstract than the second one.

Figure 1.6: Two abstractive summaries of our synthetic email conversation.

In Chapter 4 , we will see that most of the summarizers for text conversations developed so

far are fundamentally extractive in nature. However, in that chapter, we will also cover a few very

recent studies on applying abstractive summarization to text conversations [ Murray et al. , 2010 ].

Generic vs. Query-based Summarization Another important dimension related to the input of the

summarization process is whether the user is explicitly stating her information needs by means of a

query. If this is the case, a good summary should not be generated generically, but should focus on

the query, which, for instance, could refer to a particular event, date or person. In practice, a query-

based summarizer can focus on the query by taking the query into account when deciding whether

query-

based

to include some content (a sentence or a piece of information) in the summary. This is typically

done by measuring the overlap/similarity between that content and the query. A similar approach

can be followed for text conversations. For instance, a common feature used for measuring infor-

mativeness in email summarization is subject-line overlap or similarity (e.g. [ Nenkova and Bagga ,

2003 ]). If we combine the subject line with a user-provided query, we can generate query-dependent

summaries that tailor the summary to a particular information need. As another example, consider

work by Sharifi et al. [ 2010 ], where the task is automatically summarizing microblogs such as Twit-

ter messages. The algorithm takes as input a topic phrase (e.g., Ice Dancing) along with a set of

sentences from relevant tweets and it generates an extractive, query-based summary intended to

concisely convey why the topic is currently popular on Twitter (e.g., "'Ice Dancing Canadians Tessa

Virtue and Scott Moir clinch the gold in Olympic ice dancing; U.S. pair Davis and White win silver;

2/22/2010"').

For an example of a query-based abstractive summary of our synthetic email conversation, see

Figure 1.8 .

Search WWH ::

Custom Search

Home