Evaluating Self-Explanations in iSTART: Word Matching, Latent Semantic Analysis, and Topic Models - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

6.2.1 Word Matching Feedback Systems

Word matching is a very simple and intuitive way to estimate the nature of a self-

explanation. In the first version of iSTART, several hand-coded components were

built for each practice text. For example, for each sentence in the text, the “im-

portant words” were identified by a human expert and a length criterion for the

explanation was manually estimated. Important words were generally content words

that were deemed important to the meaning of the sentence and could include words

not found in the sentence. For each important word, an association list of synonyms

and related terms was created by examining dictionaries and existing protocols as

well as by human judgments of what words were likely to occur in a self-explanation

of the sentence. In the sentence “All thunderstorms have a similar life history,” for

example, important words are thunderstorm, similar, life , and history . An associa-

tion list for thunderstorm would include storms, moisture, lightning, thunder, cold,

tstorm, t-storm, rain, temperature, rainstorms , and electric-storm . In essence, the

attempt was made to imitate LSA.

A trainee's explanation was analyzed by matching the words in the explanation

against the words in the target sentence and words in the corresponding association

lists. This was accomplished in two ways: (1) Literal word matching and (2) Soundex

matching.

Literal word matching - Words are compared character by character and if

there is a match of the first 75% of the characters in a word in the target sentence

(or its association list) then we call this a literal match. This also includes removing

su x -s, -d, -ed, -ing, and -ion at the end of each words. For example, if the trainee's

self-explanation contains 'thunderstom' (even with the misspelling), it still counts

as a literal match with words in the target sentence since the first nine characters

are exactly the same. On the other hand, if it contains 'thunder,' it will not get a

match with the target sentence, but rather with a word on the association list.

Soundex matching - This algorithm compensates for misspellings by mapping

similar characters to the same soundex symbol [1, 5]. Words are transformed to their

soundex code by retaining the first character, dropping the vowels, and then con-

verting other characters into soundex symbols. If the same symbol occurs more than

once consecutively, only one occurrence is retained. For example, 'thunderstorm' will

be transformed to 't8693698'; 'communication' to 'c8368.' Note that the later exam-

ple was originally transformed to 'c888368' and two 8s were dropped ('m' and 'n'

are both mapped to '8'). If the trainee's self-explanation contains 'thonderstorm' or

'tonderstorm,' both will be matched with 'thunderstorm' and this is called a soundex

match. An exact soundex match is required for short words (i.e., those with fewer

than six alpha-characters) due to the high number of false alarms when soundex is

used. For longer words, a match on the first four soundex symbols su ces. We are

considering replacing this rough and ready approach with a spell-checker.

A formula based on the length of the sentence, the length of the explanation, the

length criterion mentioned below, the number of matches to the important words,

and the number of matches to the association lists produces a rating of 0 (inad-

equate), 1 (barely adequate), 2 (good), or 3 (very good) for the explanation. The

rating of 0 or inadequate is based on a series of filtering criteria that assesses whether

the explanation is too short, too similar to the original sentence, or irrelevant. Length

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home