Information Technology Reference
In-Depth Information
bandwidth play an important role in determining
the scalability of a Grid application.
Data splitting and separation: Data topology
considerations may require the splitting, extrac-
tion, or replication of data from data sources
involved. There are two general approaches that
are suitable for higher scalability in a Grid ap-
plication: Independent tasks per job and a static
input file for all jobs. In the case of independent
tasks, the application can be split into several jobs
that are able to work independently on a disjoint
subset of the input data. Each job produces its own
output data and the gathering of all of the results
of the jobs provides the output result by itself.
The scalability of such a solution depends on the
time required to transfer input data, and on the
processing time to prepare input data and generate
the final data result. In this case the input data may
be transported to the individual nodes on which
its corresponding job is to be run. Preloading of
the data might be possible depending on other
criteria like timeliness of data or amount of the
separated data subsets in relation to the network
bandwidth. In the case of static input files, each
job repeatedly works on the same static input data,
but with different parameters, over a long period
of time. The job can work on the same static input
data several times but with different parameters,
for which it generates differing results. A major
improvement for the performance of the Grid
application may be derived by transferring the
input data ahead of time as close as possible to
the compute nodes.
Other cases of data separation: More unfa-
vorable cases may appear when jobs have depen-
dencies on each other. The application flow may be
carefully checked in order to determine the level of
parallelism to be reached. The number of jobs that
can be run simultaneously without dependences
is important in this context. For independent jobs,
there needs to be synchronization mechanisms in
place to handle the concurrent access to the data.
Synchronizing access to one output file:
Here all jobs work with common input data and
generate their output to be stored in a common
data store. The output data generation implies that
software is needed to provide synchronization
between the jobs. Another way to process this
case is to let each job generate individual output
files, and then to run a post-processing program
to merge all these output files into the final result.
A similar case is that each job has its individual
input data set, which it can consume. All jobs then
produce output data to be stored in a common data
set. Like described above, the synchronization of
the output for the final result can be done through
software designed for the task.
Hence, thorough evaluation of the input and
output data for jobs in the Grid application is
needed to properly handle it. Also, one should
weigh the available data tools, such as federated
databases, a data joiner, and related products and
technologies, in case the Grid application is highly
data oriented or the data shows a complex structure.
PORTING AND PROGRAMMING
GRID APPLICATIONS
Besides taking into account the underlying Grid
resources and the application's data handling, as
discussed in the previous two paragraphs, another
challenge is the porting of the application program
itself. In this context, developers and users are
facing mainly two different approaches when
implementing their application on a grid. Either
they port an existing application code on a set of
distributed Grid resources. Often, in the past, the
application previously has been developed and
optimized with a specific computer architecture in
mind, for example, mainframes or servers, single-
or multiple-CPU vector computers, shared- or
distributed-memory parallel computers, or loosely
coupled distributed systems like workstation
clusters, for example. Or developers start from
scratch and design and develop a new application
program with the Grid in mind, often such that the
application architecture respectively its inherent
Search WWH ::




Custom Search