Information Technology Reference
In-Depth Information
as requiring more complex natural language processing (NLP) techniques than doc-
ument retrieval, and natural language search engines are sometimes regarded as the
next-generation search engines.
In this section, we will review the use of learning-to-rank technologies in several
QA tasks, including definitional QA, quantity consensus QA, non-factoid QA, and
Why QA.
14.2.1 Definitional QA
Definitional QA is a specific task in the TREC-QA track. Given the questions of
“what is X” or “who is X”, one extracts answers from multiple documents and
combines the extracted answers into a single unified answer. QA is ideal as a means
of helping people find definitions. However, it might be difficult to realize it in
practice. Usually definitions extracted from different documents describe the term
from different perspectives, and thus it is not easy to combine them. A more practical
way of dealing with the problem is to rank the extracted definitions according to
their likelihood of being good definitions, which is called definition search [ 16 ].
For this purpose, the first step is to collect definition candidates and define rea-
sonable features as the representation of a definition. In [ 16 ], a set of heuristic rules
are used to mine possible candidates. First, all the paragraphs in a document collec-
tion are extracted. Second, the
is
defined as the first base noun phrase, or the combination of two base phrases sepa-
rated by 'of' or 'for' in the first sentence of the paragraph. Third, those paragraphs
containing the patterns of '
term
of each paragraph is identified. Here
term
term
is a/an/the
·
', '
term
,
·
, a/an/the', or '
term
is
one of
' are selected as definition candidates.
Then, a set of features are extracted for each of these definition candidates.
Specifically, the following features are used.
·
term
occurs at the beginning of a paragraph.
term
begins with 'the', 'a', or 'an'.
All the words in
term
begin with uppercase letters.
The paragraph contains predefined negative words, e.g., 'he', 'she', and 'said'.
term
contains pronouns.
term
contains 'of', 'for', 'and', 'or', ','.
term
re-occurs in the paragraph.
term
is followed by 'is a', 'is an', or 'is the'.
Number of sentences in the paragraph.
Number of words in the paragraph.
Number of the adjectives in the paragraph.
.
With this feature representation, a standard Ranking SVM algorithm [ 7 , 8 ]is
used to learn the optimal ranking function to combine these features in order to
produce a ranking for the definition candidates. The above method has been tested
on both intranet data and the “Gov” dataset used by TREC. The experimental results
Bag of words: words frequently occurring within a window after
term
Search WWH ::




Custom Search