Evaluating Self-Explanations in iSTART: Word Matching, Latent Semantic Analysis, and Topic Models - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

Table 6.2. Measures of agreement for the Thunderstorm and Coal texts between

the eight system evaluations and the human ratings of the self-explanations in Ex-

periment 1.

Thunderstorm

WB- WB-TT WB2-TT LSA1 LSA2

LSA2/ TM1 TM2

Text

ASSO

WB2-TT

Correlation

0.47

0.52

0.43

0.60

0.61

0.64 0.56 0.58

% Agreement

48%

50%

27%

55%

57%

62% 59% 60%

d' of 0's

2.21

2.26

0.97

2.13

2.19

2.21 1.49 2.37

d' of 1's

0.84

0.79

0.66

1.32

1.44

1.45 1.27 1.39

d' of 2's

0.23

0.36

-0.43

0.47

0.59

0.85 0.74 0.70

d' of 3's

1.38

1.52

1.41

1.46

1.48

1.65 1.51 1.41

Avg d'

1.17

1.23

0.65

1.34

1.43

1.54 1.25 1.23

Coal

WB- WB-TT WB2-TT LSA1 LSA2

LSA2/ TM1 TM2

Text

ASSO

WB2-TT

Correlation

0.51

0.47

0.41

0.66

0.67

0.71 0.63 0.61

% Agreement

41%

29%

56%

57%

64% 61% 61%

d' of 0's

4.67

4.73

1.65

2.52

2.99

2.93 2.46 2.05

d' of 1's

1.06

0.89

0.96

1.21

1.29

1.50 1.38 1.52

d' of 2's

0.09

0.13

-0.37

0.45

0.49

0.94 0.74 0.61

d' of 3's

-0.16

1.15

1.28

1.59

1.79 1.60 1.50

Avg d'

1.42

1.73

0.88

1.44

1.59

1.79 1.54 1.42

well on the Thunderstorm and Coal texts, there is a high-level of agreement for

the LSA-based formulas (i.e., the results are virtually identical in the two tables).

This indicates that if we were to apply the word-based formulas to yet another text,

we have less assurance of finding the same performance, whereas the LSA-based

formulas are more likely to replicate across texts.

Figure 6.1.a provides a closer look at the data for the combined, automated

system, LSA2/WB2-TT and Figure 6.1.b for the TM2 system. As the d s indi-

cated, both systems' performance is quite good for explanations that were given

human ratings of 0, 1, or 3. Thus, the system successfully identifies poor explana-

tions, paraphrases, and very good explanations. It is less successful for identifying

explanations that consist of paraphrases in addition to some information from the

previous sentence or from world knowledge. As one might expect, some are classified

as paraphrases and some as global by the system. Although not perfect, we consider

this result a success because so few were misclassified as poor explanations.

6.3.2 Experiment 2

Self-Explanations . The self-explanations were collected from 45 middle-school stu-

dents (entering 8th and 9th grades) who were provided with iSTART training and

then tested with two texts, Thunderstorm and Coal. The texts were shortened ver-

sions of the texts used in Experiment 1, consisting of 13 and 12 sentences, respec-

tively. This chapter presents only the data from the Coal text.

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home