Evaluating Self-Explanations in iSTART: Word Matching, Latent Semantic Analysis, and Topic Models - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

evaluation of the explanation as irrelevant or too short; 1, minimally acceptable; 2,

better but including primarily the local textual context; and 3, oriented to a more

global comprehension. Depending on the text, population, and LSA space used, our

results have ranged from 55 to 70 percent agreement with expert evaluations using

that scale. We are currently attempting to improve the effectiveness of our algo-

rithms by incorporating Topic Models (TM) either in place of or in conjunction

with LSA and by using more than one LSA space from different genres (science,

narrative, and general TASA corpus). We present some of the results of these efforts

in this chapter.

Our algorithms are constrained by two major requirements, speedy response

times and speedy introduction of new texts. Since the trainer operates in real time,

the server that calculates the evaluation must respond in 4 to 5 seconds. Further-

more the algorithms must not require any significant preparation of new texts, a

requirement precisely contrary to our plans when the project began. In order to

accommodate the needs of the teachers whose classes use iSTART, the trainer must

be able to use texts that the teachers wish their students to use for practice within

a day or two. This time limit precludes us from significantly marking up the text or

gathering related texts to incorporate into an LSA corpus.

In addition to the overall 4-point quality score, we are attempting to expand

our evaluation to include an assessment of the presence of various reading strategies

in the student's explanation so that we can generate more specific feedback. If the

system were able to detect whether the explanation uses paraphrasing, bridging, or

elaboration we could provide more detailed feedback to the students, as well as an

individualized curriculum based on a more complete model of the student. For ex-

ample, if the system were able to assess that the student only paraphrased sentences

while self-explaining, and never used strategies such as making bridging inferences

or knowledge-based elaborations, then the student could be provided additional

training to generate more inference-based explanations.

This chapter describes how we employ word matching, LSA, and TM in the

iSTART feedback systems and the performance of these techniques in producing

both overall quality and reading strategy scores.

6.2 iSTART: Feedback Systems

iSTART was intended from the outset to employ LSA to determine appropriate

feedback. The initial goal was to develop one or more benchmarks for each of the

SERT strategies relative to each of the sentences in the practice texts and to use

LSA to measure the similarity of a trainee's explanation to each of the benchmarks.

A benchmark is simply a collection of words, in this case, words chosen to represent

each of the strategies (e.g., words that represent the current sentence, words that

represent a bridge to a prior sentence). However, while work toward this goal was

progressing, we also developed a preliminary “word-based” (WB) system to provide

feedback in our first version of iSTART [19] so that we could provide a complete

curriculum for use in experimental situations. The second version of iSTART has

integrated both LSA and WB in the evaluation process; however, the system still

provides only overall quality feedback. Our current investigations aim to provide

feedback based on identifying specific reading strategies.

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home