Commercial Use of WS-PGRADE/gUSE - Science Gateways for Distributed Computing Infrastructures

Information Technology Reference

In-Depth Information

characteristic metadata of the speci

first release, the aim of the

gateway was to collect a general set of metadata of all documents (document type, a

characteristic date, company, and person). All of these modules are rather time- and

calculation-consuming tasks, requiring DCI (cloud) resources as well as a general

interface to the DCI systems. WS-PGRADE, the DCI-Bridge and gUSE have been

proven as good candidates for this.

c documents. In the

19.3.1 Role of WS-PGRADE/gUSE

WS-PGRADE/gUSE is a complex system for supporting workflows that use grid or

cloud resources in the background. In the case of the eDOX Archiver Gateway such

functionality is desired. The eDOX Gateway can access the gUSE system via the

Remote API component (REST API), providing access to WS-PGRADE/gUSE

workflows. However, the biggest advantage in this case is provided by the DCI

Bridge component. The DCI-Bridge gives a common interface for several grid and

cloud systems, making job submission to any of the supported DCIs straightforward.

The DCI Bridge, together with its CloudBroker plugin, also acts as a resource

broker. The eDOX Archiver Gateway is a commercial product (and its developer is

an SME), which means that, beyond ensuring the expected level of quality, cost-

effectiveness is an important requirement. The management interface of the DCI

Bridge enables setting con

gurations that support this objective.

19.3.2 Architecture of the eDOX Gateway

The eDOX Archiver Gateway supports the digitization and processing of paper-

based documents. The architecture of the gateway is illustrated in Fig. 19.3 .

The eDOX Archiver Gateway document management system (Gateway func-

tional layer) uses a database (database layer) to store contents and metadata of

documents and workflows (document layer). The cloud management system

(interface layer) is provided by SZTAKI to access the chosen cloud service (cloud

functional layer) via the CloudBroker plugin.

The input of the workflow is the scanned, digitized document (either in PDF or

multi-paged TIFF format). This image content is uploaded (via web form, FTP/SCP

etc.) to the portal, and the server side starts processing. The OCR module covers the

following functionalities:

OCRing: recognition of characters (e.g., Asian font sets), and keeping layout

layer (e.g., creating XPath (XML Path Language) based rules for stylesheets,

Spellchecking: post-processing of OCR-enabled content; approximately 200

languages are supported currently.

Search WWH ::

Custom Search

Home