Information Technology Reference
In-Depth Information
p:p/0.64
p:p/0.95
Taxform_Start
Taxform_Start
p:p/0.21
Taxform_End
Start
End
p:p/0.95
Taxform_End
Fig. 8.5. Classification results for one page
can be interpreted as a sequence of weighted finite state transducers (WFSTs) that
are combined using the composition operation. We adopt this view by associating
our trellis of page classification results with an acoustic model applied to some input
in speech recognition. The probabilities for an individual page to be of some class
correspond to the emission probabilities represented in the recognition trellis of a
speech recognizer. The restriction we placed on only allowing complete documents
to be part of a sequence of documents corresponds to the use of a language model
that renders certain word sequences more likely than others. 9 The “language model”
we use currently only contains binary probability values, modeling hard constraints.
However, similar to language models used in speech recognition, we could employ
graded constraints represented by probabilities on language model transitions. This
could be useful, for instance, in modeling the different likelihoods of sequences of
documents, should such sequences exist.
In order to apply this analogy, we need to define the topology and contents of two
finite state transducers. For the document type/page type model, the classification
results can be represented in an FST as shown in Figure 8.5. The transitions of a
classification transducer are of two kinds:
Transitions that represent physical pages contain a symbol indicating a physical
page on the lower and upper level and a classification score as weight. Which
score is attached to the page depends on the topology of the transducer, which
is defined by the next type of transitions.
Transitions with an empty lower level denote boundary information about doc-
uments. There are transitions for the start and the end of a document. The
occurrence of these transitions thus defines the type of page and the type of
score that should be used. For instance, in Figure 8.5, the topmost transition
(with score 0.64) indicates a middle page, since there are no boundaries given.
The second transition chain belongs to a form that contains only a single page
and consequently is bounded by both a start indicator and an end indicator.
The third and fourth transitions belong to start and end pages respectively.
Figure 8.5 contains the information necessary to represent the classification re-
sults for one page with regard to one document type. The complete FST representing
a problem with three document types and four pages is shown in Figure 8.6. Note
9 On a more basic level, the document sequence restrictions can also be likened
to the use of a pronunciation dictionary within a speech recognizer. However,
acoustic modeling and pronunciation dictionary are usually combined into one
processing step, while we explicitly distinguish between these.
Search WWH ::




Custom Search