Information Technology Reference
In-Depth Information
is assessed by a ratio of the number of words in the explanation to the number in
the target sentence, taking into consideration the length criterion. For example, if
the length of the sentence is 10 words and the length priority is 1, then the required
length of the self-explanation would be 10 words. If the length of the sentence is 30
words and the length priority is 0.5, then the self-explanation would require a min-
imum of 15 words. Relevance is assessed from the number of matches to important
words in the sentence and words in the association lists. Similarity is assessed in
terms of a ratio of the sentence and explanation lengths and the number of matching
important words. If the explanation is close in length to the sentence, with a high
percentage of word overlap, the explanation would be deemed too similar to the tar-
get sentence. If the explanation failed any of these three criteria (Length, Relevance,
and Similarity), the trainee would be given feedback corresponding to the problem
and encouraged to revise the self-explanation.
Once the explanation passes the above criteria, then it is evaluated in terms of
its overall quality. The three levels of quality that guide feedback to the trainee are
based on two factors: 1) the number of words in the explanation that match either
the important words or association-list words of the target sentence compared to
the number of important words in the sentence and 2) the length of the explanation
in comparison with the length of the target sentence. This algorithm will be referred
as WB-ASSO , which stands for word-based with association list .
This first version of iSTART (word-based system) required a great deal of human
effort per text, because of the need to identify important words and, especially, to
create an association list for each important word. However, because we envisioned
a scaled-up system rapidly adaptable to many texts, we needed a system that re-
quired relatively little manual effort per text. Therefore, WB-ASSO was replaced.
Instead of lists of important and associated words we simply used content words
(nouns, verbs, adjectives, adverbs) taken literally from the sentence and the entire
text. This algorithm is referred to as WB-TT , which stands for word-based with to-
tal text . The content words were identified using algorithms from Coh-Metrix, an
automated tool that yields various measures of cohesion, readability, other charac-
teristics of language [9, 20]. The iSTART system then compares the words in the
self-explanation to the content words from the current sentence, prior sentences,
and subsequent sentences in the target text, and does a word-based match (both lit-
eral and soundex) to determine the number of content words in the self-explanation
from each source in the text. While WB-ASSO is based on a richer corpus of words
than WB-TT, the replacement was successful because the latter was intended for
use together with LSA which incorporates the richness of a corpus of hundreds of
documents. In contrast, WB-ASSO was used on its own.
Some hand-coding remained in WB-TT because the length criterion for an expla-
nation was calculated based on the average length of explanations of that sentence
collected from a separate pool of participants and on the importance of the sentence
according to a manual analysis of the text. Besides being relatively subjective, this
process was time consuming because it required an expert in discourse analysis as
well as the collection of self-explanation protocols. Consequently, the hand-coded
length criterion was replaced with one that could be determined automatically from
the number of words and content words in the target sentence (we called this word-
based with total text and automated criteria ,or WB2-TT ). The change from WB-TT
to WB2-TT affected only the screening process of the length and similarity criteria.
Its lower-bound and upper-bound lengths are entirely based on the target sentence's
Search WWH ::




Custom Search