Applications of Learning to Rank - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

as requiring more complex natural language processing (NLP) techniques than doc-

ument retrieval, and natural language search engines are sometimes regarded as the

next-generation search engines.

In this section, we will review the use of learning-to-rank technologies in several

QA tasks, including definitional QA, quantity consensus QA, non-factoid QA, and

Why QA.

14.2.1 Definitional QA

Definitional QA is a specific task in the TREC-QA track. Given the questions of

“what is X” or “who is X”, one extracts answers from multiple documents and

combines the extracted answers into a single unified answer. QA is ideal as a means

of helping people find definitions. However, it might be difficult to realize it in

practice. Usually definitions extracted from different documents describe the term

from different perspectives, and thus it is not easy to combine them. A more practical

way of dealing with the problem is to rank the extracted definitions according to

their likelihood of being good definitions, which is called definition search [ 16 ].

For this purpose, the first step is to collect definition candidates and define rea-

sonable features as the representation of a definition. In [ 16 ], a set of heuristic rules

are used to mine possible candidates. First, all the paragraphs in a document collec-

tion are extracted. Second, the

is

defined as the first base noun phrase, or the combination of two base phrases sepa-

rated by 'of' or 'for' in the first sentence of the paragraph. Third, those paragraphs

containing the patterns of '

term

of each paragraph is identified. Here

term

is a/an/the

·

', '

term

,

·

, a/an/the', or '

term

is

one of

' are selected as definition candidates.

Then, a set of features are extracted for each of these definition candidates.

Specifically, the following features are used.

·

•

term

occurs at the beginning of a paragraph.

•

term

begins with 'the', 'a', or 'an'.

•

All the words in

term

begin with uppercase letters.

•

The paragraph contains predefined negative words, e.g., 'he', 'she', and 'said'.

•

term

contains pronouns.

•

term

contains 'of', 'for', 'and', 'or', ','.

•

term

re-occurs in the paragraph.

•

term

is followed by 'is a', 'is an', or 'is the'.

•

Number of sentences in the paragraph.

•

Number of words in the paragraph.

•

Number of the adjectives in the paragraph.

•

.

With this feature representation, a standard Ranking SVM algorithm [ 7 , 8 ]is

used to learn the optimal ranking function to combine these features in order to

produce a ranking for the definition candidates. The above method has been tested

on both intranet data and the “Gov” dataset used by TREC. The experimental results

Bag of words: words frequently occurring within a window after

term

Learning to Rank for Information Retrieval

Search WWH ::

Custom Search

Home