Generating Comprehension Questions Using Paraphrase - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

while false ones, or distractors, may be true when the transformation is based on weak

discourse relations or on phrases with similar meaning.

Table 5 shows the evaluation results of the baseline and our system. The baseline

system attains better overall quality. This matches what we predicted because our

system integrates multiple components, each of which used to be an independent sys-

tem and has distinctive errors, such as the simplification system, the paraphrase gen-

eration system and the question generation system. The errors that these systems bring

in would definitely harm the overall quality as well as the grammaticality and the

score of make-sense. Still, it's delightful to see that the decrease in these scores is

slight and to have made the average difficulty of these choices higher. The challeng-

ing score is increased but not as much as we expected. This might be because the

discourse-based rules are much less productive than the SST-based ones. The top-5

choices that we evaluated are overwhelmingly occupied the SST-based choices,

which are on average not as difficult as those that involve discourse relations.

Table 4. Number of intended and actual TRUE/FALSE

Actual TRUE

Actual FALSE

Total

Intended TRUE

257 (41%)

16 (3%)

273

Intended FALSE

90 (14%)

264 (42%)

354

Total

347

280

627

Table 5. Extrinsic evaluation results

Grammaticality

(1-5)

Make-sense

(1-3)

Challenging

score (1-3)

Overall

quality (1-5)

Unchanged

sentences

Baseline

4.86

2.5

1.2

3.76

38.10%

Our system

4.22

2.39

1.51

3.53

8.57%

The statistics also suggest that our system is generating statements with more var-

iation. The percentage of unchanged sentences is 38.1% for the baseline system while

only 8.57% of the sentences in our system output are identical to the source counter-

parts. Keeping a source sentence intact is sure to produce a grammatically perfect

statement, which might be an easy test choice. On the contrary, making most of the

source sentences changed should have largely affected the quality and the grammati-

cality but our Acceptability Ranker has successfully performed to maintain the good

quality of the top-ranked choices.

5

Conclusion

In this paper, we presented a novel approach to generate statements for multiple-choice

reading comprehension questions. By exploiting discourse relations, our system creates

Technologies and Applications of Artificial Intelligence

Search WWH ::

Custom Search

Home