Databases Reference
In-Depth Information
4.2.2 MEASURING INFORMATIVENESS
In the previous section, we introduced the idea of representing sentences using term-weights and pro-
posed a very simple summarizer that weights sentences by summing or averaging over the constituent
term-weights. However, we can use these term-weights to build a considerably more sophisticated
summarizer. By representing sentences as vectors of term-weights, we can measure the similarity
of two sentences by calculating the cosine of the angle between their vectors. This similarity met-
ric is essentially the normalized dot product of the vectors and will range from 0 when sentences
share no terms to 1 when sentences are identical. We can now use this cosine similarity metric in
cosine
similarity
a variety of ways; if we are doing query-based summarization, we can calculate the similarity of a
candidate sentence to the query. If we are doing multi-document generic summarization, we can
calculate the similarity of a candidate sentence with the set of sentences already selected for ex-
traction. In fact, this is precisely what is done in the popular Maximal Marginal Relevance (MMR)
[ Carbonell and Goldstein , 1998 ] summarization approach.
MMR
In MMR, sentences are chosen according to a weighted combination of their relevance to a
query (or for generic summaries, their general relevance) and their redundancy with the sentences
redundancy
that have already been extracted. Both relevance and redundancy are measured using cosine similarity.
The usual MMR score Sc MMR (i) for a given sentence S i
in the document is given by
Sc MMR (i)
=
λ(cos(S i ,q))
( 1
λ) max
S j summ (cos(S i ,S j )) ,
where q is the query vector, summ is the set of sentences already extracted, and λ trades off between
relevance and redundancy. The term cos is the cosine similarity between two documents. The
MMR algorithm iteratively generates the extractive summary, at each step selecting the sentence i
that maximizes Sc MMR (i) and recalculating the scores of the remaining unselected sentences. This
recalculation is necessary because the redundancy scores will have changed each time a new sentence
is added to the summary. However, if λ equals 1 then redundancy scores will be ignored and MMR
will return the sentences most similar to the query.
Whereas MMR is an unsupervised extraction algorithm, many recent extractive systems are
supervised machine learning approaches and incorporate a variety of features in addition to term-
weights such as tf.idf . A classifier is trained on data where each sentence is hand labeled as informative
or not informative, and sentences in the test data are classified as informative or non-informative
based on the trained model. In the sections below we will discuss the types of features used by different
supervised systems. Because the supervised classifier is typically only predicting the relevance of the
candidate sentences, such summarization systems will often incorporate a post-classification step
designed to reduce redundancy.This might involve clustering the informative sentences and selecting
clustering
only a handful from each cluster.
This preceding discussion of informativeness is relevant primarily to the extractive paradigm,
and the architecture of abstractive systems is typically much different. Rather than rating the in-
formativeness of individual sentences, abstractive summarizers tend to look for patterns, messages
or events that abstract over numerous sentences. Informativeness might be based at least partly on
messages
Search WWH ::




Custom Search