Generating Comprehension Questions Using Paraphrase - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

Paraphrase extraction focuses on approaches that automatically acquire paraphrases

from corpora and paraphrase generation produces paraphrase for any input sentence.

Table 2. Paraphrase resources and likelihood

Alias

Resource

Paraphrase likelihood

PT-1

PPDB lexical paraphrase

|||

PT-2

PPDB phrasal paraphrase

PT-3

PPDB syntactic paraphrase

PT-4

WordNet synonyms/entailments

, ,

, , · , ,

PT-5

Inference rules for predicates

Representative mentions:

Other mentions: .

PT-6

Nominal Coreference

Self 1

PT-7

Among the many paraphrase generation framework, we favor the idea proposed in

[23] to combine multiple paraphrase resources, which allows us to flexibly introduce

application-specific resources to the framework. We incorporate pairs of mentions

extracted from the same coreference chain as paraphrases, which hasn't been ex-

ploited in existing paraphrase generation systems because they do not consider the

article information. Besides coreference, resources like the ParaPhrase DataBase

(PPDB) [5], WordNet and context-sensitive inference rules for predicates [12] are

also included. These resources provide a diversity of paraphrases, from lexical, phras-

al, syntactic to referential. For any input sentence, the paraphrase planning phase in

Fig. 3 cuts the sentence into segments and transforms them into the search patterns of

each resource. It outputs all possible paraphrases for all segments in the input sen-

tence. In the next phase, to form a paraphrased sentence from all possible substitutes,

we use a log-linear model [22] to score the combination:

pt|s∑ ∑ ln ,

∑ ln

(1)

In Equation 1, s represents the source sentence and t is the target sentence. K is the

total number of paraphrase tables and J is the unit of the J -gram language model.

, is the sum of the paraphrase likelihood scores of the substitutes for the

i -th segment that are found in PT- k . The likelihood scores for each PT is defined in

Table 2. The second part of the addition is the J -gram ( J = 3) language model score of

t and is retrieved via Microsoft web n-gram services 2 . and are the parame-

ters that represent the weights of the sub-scores. The calculation is reduced to the

Viterbi algorithm and the top-scoring target sentences can be easily found.

1 The self-table is created dynamically for each word in the input sentence. This allows words

in the sentence to remain unchanged when there is no better substitute.

2 http://weblm.research.microsoft.com/

Technologies and Applications of Artificial Intelligence

Search WWH ::

Custom Search

Home