Introduction - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

Figure 1.8: Sample query-focused (abstractive) summary of our synthetic email conversation.

example, a human-authored abstract in a science journal will usually give a high-level overview of

the experiments and conclusions, but may highlight a key finding in some detail.

Domain-Specific vs.General-Purpose Summarization We have mentioned conversation types such

as emails, meetings, blogs and chats, and we refer to these as separate conversation modalities . Modal-

ity here refers to a means or mode of communication, where a particular conversation modality may

modality

be associated with both distinct communication technologies as well as distinct social conventions

and language characteristics. From a more general viewpoint, without reference to communication

or language, these can also be considered distinct domains , and we will use the two terms more or

domain

less interchangeably here.

For many tasks there is a tension between developing solutions that are general and broadly

applicable, and implementing tools that work only in specific domains, but are highly effective.

Summarization is not an exception in this respect. Researchers have worked both on domain specific

systems (e.g., McKeown et al. [ 2002 ]) for news, Zhou et al. [ 2004 ] for biographies) and on general

purpose platforms [ Radev et al. , 2004 ]. A related distinction for summarizing text conversation,

that will be discussed in Chapter 4 , is whether a summarization approach can be only applied to

a particular conversational modality (e.g., email), or whether it can work on any text conversation,

independently from its modality. While most of the summarizers described in Chapter 4 are do-

main/modality specific, as they exploit peculiar features of those modalities (e.g., the subject line for

emails, user ratings for blog posts), we will also cover recent attempts to design a multi-modal sys-

tem [ Murray and Carenini , 2008 ] that relies only on features common to all multi-party interaction,

such as speaker dominance in the conversations, turn-taking, lexical cohesion, etc. This system is

not only capable of summarizing conversations in different modalities (e.g., meeting, emails, blogs),

but it can also work on conversations spanning multiple modalities (e.g., a transcript of a meeting

that was followed up by an email conversation). A multi-modal approach presents two additional,

critical advantages. First, by only harnessing features shared by all the modalities, it can facilitate the

transfer of knowledge from one modality to another [ Sandu et al. , 2010 ], which in machine learn-

ing is called domain adaptation [ Daumé and Marcu , 2006 ]. Secondly, this general approach should

domain

adaptation

easily cover novel conversational modalities that are being constantly created by people's creativity

and technological advancements.

Search WWH ::

Custom Search

Home