Information Technology Reference
In-Depth Information
classification of tokens was made by a domain expert (a fifth-year computer science
student).
We examined three approaches. All of them used semantic knowledge to adapt
the dictionary that is used by the handwriting recognition engine. They are based on
the assumption that annotations made on lecture slides frequently contain words that
are also written on these slides. In particular, this takes into account domain-specific
words, which are typically not included in the standard dictionary.
Approach 1: Dictionary from all slides The first experiment adds the tokens from
all slides of the given lecture to the dictionary. Hence, the dictionary is the same for
all annotations from our corpus. It contains 2283 words taken from the 42 slides of
the lecture. We will refer to this dictionary as Dictionary A.
Approach 2: Dictionary from current slides The second experiment uses differ-
ent dictionaries for annotations made on different slides. The dictionary for a spe-
cific annotation contains all tokens extracted from the slide the annotation is located
on. We will refer to these dictionaries as Dictionaries B.
Approach 3: Sliding-window dictionary The third experiment relies on a sliding-
window approach. Again, annotations made on different slides have different dic-
tionaries. The dictionary for a specific annotation contains all tokens extracted from
the slide the annotation is located on and all tokens from the preceding five and the
following five slides. If the slide is amongst the first or last five slides, the smaller
number of all preceding or all subsequent slides is used. We experimented with dif-
ferent numbers of preceding and subsequent slides and found that in our case, the
number of five slides provides the best results. As a matter of course, this number
depends on the specific slide set. We refer to these dictionaries as Dictionaries C.
Improved Performance
Table 5.3 gives the recognition results for domain-specific words for all three types
of dictionaries and contrasts them with the baseline. The use of either dictionary
significantly reduces the word error rate. Dictionary B (all tokens from the current
slide) clearly outperforms the other dictionaries. In contrast to using no domain-
specific dictionary, the results show a relative word error rate reduction of almost
20 %. It is not surprising that the character error rate did not decrease, since the
dictionary is not used for recognition on the level of individual characters.
Tabl e 5. 3 Performance of the handwriting recognition for domain-specific terms
Word error rate (%) Character error rate (%)
Baseline: No domain-specific dictionary
45.3 %
18.2 %
Dictionary A (all slides)
41.2 %
16.4 %
Dictionaries B (current slide)
36.5 %
16.2 %
Dictionaries C (sliding window)
41.8 %
17.4 %
 
Search WWH ::




Custom Search