Information Technology Reference
In-Depth Information
Table 19.1 Execution statistics of eDOX workflow components
#
Step
Measuring unit
Amount
Note
1
Retrieve document
(from database or le
system)
sec/piece
[document]
0.50 - 10.00
Depends on le
size, (<10 MB) and
connection speeds
2
OCRing
Spellchecking
sec/piece
[document]
4300.00
In case of avg.
100 pages/document
sec/piece
[page]
43.00
Depends on file size
and resolution (dpi)
sec/piece
[word]
0.22
In case of avg.
200 words/page
3
Extracting
Indexing
Ranking
Classifying
Labeling
Metadating
sec/piece
[document]
3000.00
In case of avg.
100 pages/document
sec/piece
[page]
30.00
In case of avg.
200 words/page
sec/piece
[word]
0.15
-
4
Digital signing
sec/piece
[document]
2.50
Depends on le
size (<10 MB) and
connection speed
approximately 90 s, with an additional 10 min to set up all the virtual machines.
This compares rather well to the almost 20-h processing time on a single processor.
Users can access the portal via a web-based GUI. After successful login, users
can upload the documents to be processed, and start processing them by accepting
the calculated cost. The processed contents, including signatures, OCRized contents
and metadata, can be downloaded from the portal.
19.3.4 Further Development Plans
The
first release of the eDOX Document Archiver Gateway collects document type
independent sets of the metadata, as was explained earlier. Future releases will
consider more sophisticated approaches, such as form recognition, for example.
This method can identify metadata that is dependent on various document types and
will be more helpful searching information in documents, such as:
Invoices [invoice no., issuer, company, price per unit, total price],
￿
Salary documents [name, birthday, year],
￿
Sickness/bene
t documents [name, birthday, year],
￿
Employment contracts [name, year, job],
￿
Delivery notes [date, ID, order ID, company name],
￿
Ledger documents [year, period],
￿
Balance sheets [year, period],
￿
Search WWH ::




Custom Search