Information Technology Reference
In-Depth Information
Similarly, the rule FST can be used to make demands about the order of doc-
uments, depending on the application. For instance, a mortgage application could
prescribe that an Appraisal is always directly followed by a Note. Using a powerful
representation mechanism such as finite state transducers simplifies the introduction
of additional functionality significantly.
However, there are also drawbacks to the naive implementation of operations over
finite state transducers. Additional representational power in this case comes at a
price, mainly in terms of memory consumption and secondary in processing time. A
typical mortgage application defines somewhere between 50 and 200 document types
and consequently between 150 and 600 classes for which probabilities and graphs
have to be produced. A batch may be as long as 1000 pages. The composition of
the resulting classification graph with the rule graph requires a great deal of space
to hold and time to construct. We experienced graph sizes of over one gigabyte and
runtime reached tens of minutes, clearly insu cient for the successful application in
industry.
This situation is again similar to speech recognition with language modeling.
A language model represents an extremely large number of possible and probable
word sequences. Incorporating this knowledge into the basic recognition trellis is
infeasible due to both space and time restrictions. Thus, both knowledge sources
are kept separate. The probabilities delivered by the language model are taken into
account by the search for the best word sequence. For our problem, we also notice
that the composition of the classification and rules FSTs is transient; in the end,
we are interested only in the best path through the composition FST. Thus, it
is unnecessary to completely unfold the composition. Instead, we use a technique
similar to delayed composition [17] combined with beam search [18] to extract the
final result. This reduces memory usage significantly for large problems. Table 8.4
shows the memory usage for a few data points, all of which are well within the
limits of our requirements. The runtime is also much lower; for 1000 pages and
200 categories, the processing time decreases from roughly 16 minutes for the naive
approach to less than 3 minutes for the advanced procedure.
Table 8.4. Memory usage for separation
Batch size Memory Usage
in pages 200 categories 300 categories
1000
118 MB
151 MB
2000
211 MB
363 MB
8.5 Results
We evaluated the performance of all probability models for sequences that we de-
scribed in Section 8.4.2. Table 8.5 shows the F 1-values for all models. The table
suggests three main results:
 
Search WWH ::




Custom Search