Information Technology Reference
In-Depth Information
Table 6.2. Measures of agreement for the Thunderstorm and Coal texts between
the eight system evaluations and the human ratings of the self-explanations in Ex-
periment 1.
Thunderstorm
WB- WB-TT WB2-TT LSA1 LSA2
LSA2/ TM1 TM2
Text
ASSO
WB2-TT
Correlation
0.47
0.52
0.43
0.60
0.61
0.64 0.56 0.58
% Agreement
48%
50%
27%
55%
57%
62% 59% 60%
d' of 0's
2.21
2.26
0.97
2.13
2.19
2.21 1.49 2.37
d' of 1's
0.84
0.79
0.66
1.32
1.44
1.45 1.27 1.39
d' of 2's
0.23
0.36
-0.43
0.47
0.59
0.85 0.74 0.70
d' of 3's
1.38
1.52
1.41
1.46
1.48
1.65 1.51 1.41
Avg d'
1.17
1.23
0.65
1.34
1.43
1.54 1.25 1.23
Coal
WB- WB-TT WB2-TT LSA1 LSA2
LSA2/ TM1 TM2
Text
ASSO
WB2-TT
Correlation
0.51
0.47
0.41
0.66
0.67
0.71 0.63 0.61
% Agreement
41%
41%
29%
56%
57%
64% 61% 61%
d' of 0's
4.67
4.73
1.65
2.52
2.99
2.93 2.46 2.05
d' of 1's
1.06
0.89
0.96
1.21
1.29
1.50 1.38 1.52
d' of 2's
0.09
0.13
-0.37
0.45
0.49
0.94 0.74 0.61
d' of 3's
-0.16
1.15
1.28
1.59
1.59
1.79 1.60 1.50
Avg d'
1.42
1.73
0.88
1.44
1.59
1.79 1.54 1.42
well on the Thunderstorm and Coal texts, there is a high-level of agreement for
the LSA-based formulas (i.e., the results are virtually identical in the two tables).
This indicates that if we were to apply the word-based formulas to yet another text,
we have less assurance of finding the same performance, whereas the LSA-based
formulas are more likely to replicate across texts.
Figure 6.1.a provides a closer look at the data for the combined, automated
system, LSA2/WB2-TT and Figure 6.1.b for the TM2 system. As the d s indi-
cated, both systems' performance is quite good for explanations that were given
human ratings of 0, 1, or 3. Thus, the system successfully identifies poor explana-
tions, paraphrases, and very good explanations. It is less successful for identifying
explanations that consist of paraphrases in addition to some information from the
previous sentence or from world knowledge. As one might expect, some are classified
as paraphrases and some as global by the system. Although not perfect, we consider
this result a success because so few were misclassified as poor explanations.
6.3.2 Experiment 2
Self-Explanations . The self-explanations were collected from 45 middle-school stu-
dents (entering 8th and 9th grades) who were provided with iSTART training and
then tested with two texts, Thunderstorm and Coal. The texts were shortened ver-
sions of the texts used in Experiment 1, consisting of 13 and 12 sentences, respec-
tively. This chapter presents only the data from the Coal text.
 
Search WWH ::




Custom Search