Information Technology Reference
In-Depth Information
Fig. 19.3 Architecture of eDOX Archiver Gateway
The goal of content processing is to support classifying, labeling, and metadata
handling functions, such as:
extracting: dictionary-based and regular expression-based (regexp) rules in order
to retrieve prede
￿
ned expressions, words, data structures;
indexing: canonicalization of content by transforming expressions to the roots
(available only for Hungarian at the moment);
￿
ranking: enumerating roots, assigning counters and other ranking values based
on a ranking algorithm;
￿
classifying: categorization of document based on algorithms that analyze con-
tent values and content types (e.g., the document is probably an invoice if it
contains the string
￿
Invoice
as title, and contains date data types).
Search WWH ::




Custom Search