Information Technology Reference
In-Depth Information
Based on the legal background and business needs, metadata and labels are
automatically generated and assigned to the content:
labeling: retrieving and assigning the most relevant expressions (results of
ranking) as a
￿
tag cloud
to the processed document,
metadating: the construction of XML-based (Management Information
Resources for eGovernment, MIReG) metadata structure based on retrieved
metadata of processed document.
￿
The cryptographic module is invoked to create an electronic signature on the
whole data package. This functionality is required by the legislation in order these
digitized documents are acceptable in a legal mean:
digital signing: creation of XML-based (XML Advanced Electronic Signatures,
XadES) electronic signature on a processed document and its related outputs by
using a cryptographic private key (e.g. RSA), retrieving timestamp and revo-
cation information (e.g. Certi
￿
cate Revocation List, CRL), or Online Certi
cate
Status Protocol, OCSP) response.
The eDOX Archiver Gateway as a commercial product manages user accounts
and supports clearing functionality. The integration of a payment solution is not in
the scope of the product. There are several methods for credit top-up of a user
account (e.g., money transfer to a central bank account via netbanking solution), but
these methods shall be supported by the environment:
accounting: calculating accountable values based on the real usage of the service
(e.g., processed page per job).
￿
The output electronic data package is in a legal sense equivalent to the input
paper-based document.
19.3.3 Usage of the eDOX Acrhiver Gateway
During workflow design, the execution statistics of each step were analyzed, as
shown in Table 19.1 . Based on these statistics, computation-intensive and paral-
lelizable functions, such as recognizing characters (OCRing) and
finding the roots
of the words (indexing), were mapped into the cloud in order to reduce execution
time of these jobs.
The eDOX Archiver Gateway is a commercial product, where users can set
priorities and decide the level of parallelization based on the precalculated cost of
allocating cloud resources and executing the jobs. In principal, the best throughput
time could be achieved by allocating a separate cloud resource to each document
page. However, in practice this could be different due to the default boot and setup
time (
10 min) of virtual machines. For example, in the case of a larger topic that
has 830 pages, it is possible to dedicate a virtual machine for each page for
OCRing,
*
finding the roots and indexing. In this case, processing time is measured at
Search WWH ::




Custom Search