Information Technology Reference
In-Depth Information
while false ones, or distractors, may be true when the transformation is based on weak
discourse relations or on phrases with similar meaning.
Table 5 shows the evaluation results of the baseline and our system. The baseline
system attains better overall quality. This matches what we predicted because our
system integrates multiple components, each of which used to be an independent sys-
tem and has distinctive errors, such as the simplification system, the paraphrase gen-
eration system and the question generation system. The errors that these systems bring
in would definitely harm the overall quality as well as the grammaticality and the
score of make-sense. Still, it's delightful to see that the decrease in these scores is
slight and to have made the average difficulty of these choices higher. The challeng-
ing score is increased but not as much as we expected. This might be because the
discourse-based rules are much less productive than the SST-based ones. The top-5
choices that we evaluated are overwhelmingly occupied the SST-based choices,
which are on average not as difficult as those that involve discourse relations.
Table 4. Number of intended and actual TRUE/FALSE
Actual TRUE
Actual FALSE
Total
Intended TRUE
257 (41%)
16 (3%)
273
Intended FALSE
90 (14%)
264 (42%)
354
Total
347
280
627
Table 5. Extrinsic evaluation results
Grammaticality
(1-5)
Make-sense
(1-3)
Challenging
score (1-3)
Overall
quality (1-5)
Unchanged
sentences
Baseline
4.86
2.5
1.2
3.76
38.10%
Our system
4.22
2.39
1.51
3.53
8.57%
The statistics also suggest that our system is generating statements with more var-
iation. The percentage of unchanged sentences is 38.1% for the baseline system while
only 8.57% of the sentences in our system output are identical to the source counter-
parts. Keeping a source sentence intact is sure to produce a grammatically perfect
statement, which might be an easy test choice. On the contrary, making most of the
source sentences changed should have largely affected the quality and the grammati-
cality but our Acceptability Ranker has successfully performed to maintain the good
quality of the top-ranked choices.
5
Conclusion
In this paper, we presented a novel approach to generate statements for multiple-choice
reading comprehension questions. By exploiting discourse relations, our system creates
Search WWH ::




Custom Search