Information Technology Reference
In-Depth Information
Paraphrase extraction focuses on approaches that automatically acquire paraphrases
from corpora and paraphrase generation produces paraphrase for any input sentence.
Table 2. Paraphrase resources and likelihood
Alias
Resource
Paraphrase likelihood
PT-1
PPDB lexical paraphrase
|||
PT-2
PPDB phrasal paraphrase
PT-3
PPDB syntactic paraphrase
PT-4
WordNet synonyms/entailments
, ,
, , Ā· , ,
PT-5
Inference rules for predicates
Representative mentions:
Other mentions: .
PT-6
Nominal Coreference
Self 1
PT-7
Among the many paraphrase generation framework, we favor the idea proposed in
[23] to combine multiple paraphrase resources, which allows us to flexibly introduce
application-specific resources to the framework. We incorporate pairs of mentions
extracted from the same coreference chain as paraphrases, which hasn't been ex-
ploited in existing paraphrase generation systems because they do not consider the
article information. Besides coreference, resources like the ParaPhrase DataBase
(PPDB) [5], WordNet and context-sensitive inference rules for predicates [12] are
also included. These resources provide a diversity of paraphrases, from lexical, phras-
al, syntactic to referential. For any input sentence, the paraphrase planning phase in
Fig. 3 cuts the sentence into segments and transforms them into the search patterns of
each resource. It outputs all possible paraphrases for all segments in the input sen-
tence. In the next phase, to form a paraphrased sentence from all possible substitutes,
we use a log-linear model [22] to score the combination:
pt|sāˆ‘ āˆ‘ ln ,
āˆ‘ ln
(1)
In Equation 1, s represents the source sentence and t is the target sentence. K is the
total number of paraphrase tables and J is the unit of the J -gram language model.
, is the sum of the paraphrase likelihood scores of the substitutes for the
i -th segment that are found in PT- k . The likelihood scores for each PT is defined in
Table 2. The second part of the addition is the J -gram ( J = 3) language model score of
t and is retrieved via Microsoft web n-gram services 2 . and are the parame-
ters that represent the weights of the sub-scores. The calculation is reduced to the
Viterbi algorithm and the top-scoring target sentences can be easily found.
1 The self-table is created dynamically for each word in the input sentence. This allows words
in the sentence to remain unchanged when there is no better substitute.
2 http://weblm.research.microsoft.com/
Search WWH ::




Custom Search