Information Technology Reference
In-Depth Information
are recognized with high confidence. Part (b) shows examples where recognition
failed. Segmentation problems seem to be the most frequent reason for substitu-
tions. If the digit is not tightly framed and some additional foreground structure
is present in the normalized image, it deviates from the typical appearance and is
therefore difficult to recognize. Missing digit parts also complicate recognition. Un-
usual context may as well cause substitutions, as in the example in the second row
of the figure. Here, the left digit has been correctly recognized as 4 by the block rec-
ognizer, but the right digit 0 occurs only very rarely next to a 4 in the dataset. The
digits 6 , 4 , and 1 are much more common in this context. Consequently, the digit is
recognized as 6 with medium confidence. While in this particular example, the use
of the context information does not seem to be beneficial, in general it facilitates
recognition, as can be concluded from the following control experiment.
The same network was trained with a context vector that was set to zero. Without
access to the context information, the classification performance degrades. The best
test performance for the left digit has now a substitution rate of 1.91%. The best
right digit classifier substitutes even 7.25% of the test images when all examples
are accepted. These figures show the importance of context for the recognition of
isolated digits.
7.5.3 Combination with Block Recognition
Digit recognition is not done for all examples, but only if the block recognizer is
not confident enough. If its classification confidence for one of the digits is below
a threshold ρ , this digit is preprocessed and presented to the digit classifier. When
ρ = 0 . 9 is chosen, 134 (12.2%) blocks are rejected from the 1,099 test examples.
82 (61%) left digits and 101 (75%) right digits are ambiguous.
The outputs of the digit recognizer need to be combined with the ones of the
block classifier. This is done by computing the average output v c = ( v b + v d ) / 2 ,
where v d denotes the output vector of the digit classifier and v b is the correspond-
ing section of the block classifier output. The digit's confidence is again set to the
difference between the most active and the second most active combined output. It
does not exceed the higher one of the two digit confidences.
Figure 7.18 illustrates some typical cases of output combination. If both classi-
fiers are confident and agree on the class, the combined output is confident. If both
classifiers disagree on the class, the output is not confident. If one classifier is silent,
(a) (b) (c) (d)
Fig. 7.18. Combination of outputs from block classifier and digit classifier: (a) both classifiers
agree; (b) both classifiers disagree; (c) one classifier is inactive, while the other is confident;
(d) one classifier is undecided between two classes, while the other is confident.
Search WWH ::




Custom Search