Information Technology Reference
In-Depth Information
i =1
( D 1 i × D 2 i )
Sim ( D 1 ,D 2) =
i =1
( D 1 i ) 2 × i =1
(6.3)
( D 2 i ) 2
Since the first versions of iSTART were intended to improve students' compre-
hension of science texts, the LSA space was derived from a collection of science texts
[11]. This corpus consists of 7,765 documents containing 13,502 terms that were used
in two or more documents. By the time the first version of the LSA-based system
was created (referred to as LSA1 ), the original goal of identifying particular strate-
gies in an explanation had been replaced with the less ambitious one of rating the
explanation as belonging one of three levels [22]. The highest level of explanation,
called “ global-focused ,” integrates the sentence material in a deep understanding of
the text. A “ local-focused ” explanation explores the sentence in the context of its
immediate predecessors. Finally, a “ sentence-focused ” explanation goes little beyond
paraphrasing. To assess the level of an explanation, it is compared to four bench-
marks or bags of words. The rating is based on formulae that use weighted sums of
the four LSA cosines between the explanation and each of the four benchmarks.
The four benchmarks include: 1) the words in the title of the passage (“title”),
2) the words in the sentence (“current sentence”), 3) words that appear in prior
sentences in the text that are causally related to the sentence (“prior text”), and
4) words that did not appear in the text but were used by two or more subjects
who explained the sentence during experiments (“world knowledge”). While the title
and current sentence benchmarks are created automatically, the prior-text bench-
mark depends on a causal analysis of the conceptual structure of the text, relating
each sentence to previous sentences. This analysis requires both time and expertise.
Furthermore, the world-knowledge benchmark requires the collection of numerous
explanations of each text to be used. To evaluate the explanation of a sentence, the
explanation is compared to each benchmark, using the similarity function mentioned
above. The result is called a cosine value between the self-explanation (SE) and the
benchmark. For example, Sim(SE, Title) is called the title LSA cosine . Discriminant
Analysis was used to construct the formulae that categorized the overall quality as
being a level 1, 2, or 3 [23]. A score is calculated for each of the levels using these
formulae. The highest of the three scores determines the predicted level of the expla-
nation. For example, the overall quality score of the explanation is a 1 if the level-1
score is higher than both the level-2 and level-3 scores.
Further investigation showed that the LSA1 cosines and the factors used in
the WB-ASSO approach could be combined in a discriminant analysis that re-
sulted in better predictions of the values assigned to explanations by human experts.
However, the combined approach was less than satisfactory. Like WB-ASSO, LSA1
was not suitable for an iSTART program that would be readily adaptable to new
practice texts. Therefore, we experimented with formulae that would simplify the
data gathering requirements to develop LSA2. Instead of the four benchmarks men-
tioned above, we discarded the world knowledge benchmark entirely and replaced
the benchmark based on causal analysis of prior-text with one that simply consisted
of the words in the previous two sentences. We could do this because the texts
were taken from science textbooks whose argumentation tends to be highly linear
argumentation in science texts; consequently the two immediately prior sentences
Search WWH ::




Custom Search