Information Technology Reference
In-Depth Information
is executed to set the URL of the selected target to be
harvested from CIARD RING, the path of where the records are stored in the grid,
the number of records to be harvested, etc. Then job
At
rst, job
Init
executes and
manages the predeployed real harvest process. One of the next parallel conditional
branches (see Agris and LOM2 + LinkChecker) is executed depending on the
metadata format required by the users; it is also responsible for performing the
needed transformations. Both branches validate whether a record can be trans-
formed to the metadata format, or not; hence two record-sets will be produced by
these jobs with the correct and the wrong records separated. Then, the produced
folders will be compressed and uploaded by the job
Harvest
Upload
into the given LFC
folder. As the next step, the job
Register
registers the
files uploaded by REST
service invocation to CouchDB so that the
files can be accessed from outside the
grid. Finally, the last job sends a request to CouchDB to get the metadata infor-
mation about the registered
file-sets in JSON format.
The progress information as well as these output
files are sent back (using
Remote API) to the users through the Drupal module, providing complete URLs
where the harvested records can be found.
17.2.3 Use of the agINFRA Science Gateway for Work
fl
ows
For agINFRA purposes gUSE version 3.6.1 has been currently installed and
maintained by SZTAKI. The gateway is con
gured to the gLite-based agINFRA
Virtual Organization including 4 sites from Italy, Serbia, and Hungary all-together
equipped with 3,500 CPUs and 0.9 PetaByte data storage. As the workflow is in
prerelease phase, no user statistics available, however (according to current plans)
we expect only a few direct users among information managers but the harvested
datasets will serve magnitudes of order more users; researchers, students, librarians,
etc.
17.2.4 Further Development Plans
There is room for improvement at several levels. One on-going work (initiated by the
agINFRA partners) is to put the demo version of the cloud-based on-demand agIN-
FRA integrated service deployment into production. Its current implementation is
deployed on the OpenNebula-based SZTAKI cloud (i) a gUSE instance with its
workflow engine and Remote API to provide external interface, (ii) BIOVEL and
agINFRA workflows, (iii) an extensible distributed computing infrastructure based
on SZTAKI Desktop Grid (Kacsuk 2009) and 3G Bridge technologies (Kacsuk
2011b). This approach allows the partners to create the integrated service on-demand,
as well as some
fine-tailored and temporary micro-portals later that enable the user to
handle a subset of harvested data for further in-depth research and studies. The service
Search WWH ::




Custom Search