Summarizing Text Conversations - Methods for Mining and Summarizing Text Conversations

Databases Reference

In-Depth Information

4.2.2 MEASURING INFORMATIVENESS

In the previous section, we introduced the idea of representing sentences using term-weights and pro-

posed a very simple summarizer that weights sentences by summing or averaging over the constituent

term-weights. However, we can use these term-weights to build a considerably more sophisticated

summarizer. By representing sentences as vectors of term-weights, we can measure the similarity

of two sentences by calculating the cosine of the angle between their vectors. This similarity met-

ric is essentially the normalized dot product of the vectors and will range from 0 when sentences

share no terms to 1 when sentences are identical. We can now use this cosine similarity metric in

cosine

similarity

a variety of ways; if we are doing query-based summarization, we can calculate the similarity of a

candidate sentence to the query. If we are doing multi-document generic summarization, we can

calculate the similarity of a candidate sentence with the set of sentences already selected for ex-

traction. In fact, this is precisely what is done in the popular Maximal Marginal Relevance (MMR)

[ Carbonell and Goldstein , 1998 ] summarization approach.

MMR

In MMR, sentences are chosen according to a weighted combination of their relevance to a

query (or for generic summaries, their general relevance) and their redundancy with the sentences

redundancy

that have already been extracted. Both relevance and redundancy are measured using cosine similarity.

The usual MMR score Sc MMR (i) for a given sentence S i

in the document is given by

Sc MMR (i)

=

λ(cos(S i ,q))

−

( 1

−

λ) max

S j ∈ summ (cos(S i ,S j )) ,

where q is the query vector, summ is the set of sentences already extracted, and λ trades off between

relevance and redundancy. The term cos is the cosine similarity between two documents. The

MMR algorithm iteratively generates the extractive summary, at each step selecting the sentence i

that maximizes Sc MMR (i) and recalculating the scores of the remaining unselected sentences. This

recalculation is necessary because the redundancy scores will have changed each time a new sentence

is added to the summary. However, if λ equals 1 then redundancy scores will be ignored and MMR

will return the sentences most similar to the query.

Whereas MMR is an unsupervised extraction algorithm, many recent extractive systems are

supervised machine learning approaches and incorporate a variety of features in addition to term-

weights such as tf.idf . A classifier is trained on data where each sentence is hand labeled as informative

or not informative, and sentences in the test data are classified as informative or non-informative

based on the trained model. In the sections below we will discuss the types of features used by different

supervised systems. Because the supervised classifier is typically only predicting the relevance of the

candidate sentences, such summarization systems will often incorporate a post-classification step

designed to reduce redundancy.This might involve clustering the informative sentences and selecting

clustering

only a handful from each cluster.

This preceding discussion of informativeness is relevant primarily to the extractive paradigm,

and the architecture of abstractive systems is typically much different. Rather than rating the in-

formativeness of individual sentences, abstractive summarizers tend to look for patterns, messages

or events that abstract over numerous sentences. Informativeness might be based at least partly on

messages

Methods for Mining and Summarizing Text Conversations

Search WWH ::

Custom Search

Home