Utility-Based Information Distillation - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

3. Repeat step 2 until all the passages in the current list have been

examined.

After applying the abovementioned algorithm, each passage in the new list is

suciently dissimilar to others, thus favoring diversity rather than redundancy

in the new ranked list. The anti-redundancy threshold t is tuned on a training

set.

9.4 Evaluation Methodology

The approach we proposed above for information distillation raises

important issues regarding evaluation methodology. Firstly, since our system

allow the output to be passages at different leves of granularity (e.g., k -

sentence windows where k may vary) instead of a fixed level, it is not possible

to have pre-annotated relevance judgments at all such granularity levels.

Secondly, since we wish to measure the utility of the system output as a

combination of both relevance and novelty, traditional relevance-only based

measures must be replaced by measures that penalize the repetition of the

same information in the system output across time. Thirdly, since the output

of the system is ranked lists, we must reward those systems that present useful

information (both relevant and previously unseen) using shorter ranked lists,

and penalize those that present the same information using longer ranked lists.

None of the existing measures in ad hoc retrieval, adaptive filtering, novelty

detection or other related areas (text summarization and question answering)

have desirable properties in all the three aspects. Therefore, we must develop

a new one.

9.4.1 Answer Keys

To enable the evaluation of a system whose output consists of passages of

arbitrary length, we borrow the concept of answer keys from the Question

Answering (QA) community, where systems are allowed to produce arbitrary

spans of text as answers. Answer keys define what should be present in

a system response to receive credit, and are comprised of a collection of

information nuggets , i.e., factoid units about which human assessors can make

binary decisions of whether or not a system response contains them.

Defining answer keys and the associated binary decisions is a conceptual

task that requires semantic mapping (22), since a system can present the

same piece of information in many different ways. Hence, QA evaluations

have relied on human assessors, making them costly, time consuming and

not scalable to large query sets, document collections and extensive system

evaluations with various parameter settings.

Search WWH ::

Custom Search

Home