Information Technology Reference
In-Depth Information
becomes a dominant factor as processor number
increases, leading to a loss in application scal-
ability with growing number of processors. Gus-
tafson (1988) proved that this holds only for fixed
problem size, and that in practice, with increasing
number of processors, the user increases problem
size as well, always trying to solve the largest
possible problem on any given number of CPUs.
Gustafson demonstrated this on a 1028-proces-
sor parallel system, for several applications. For
example, he was able to achieve a speed-up factor
of over 1000 for a Computational Fluid Dynam-
ics application with 1028 parallel processes on
the 1028-processor system. Porting these highly
parallel applications to a grid, however, has shown
that many of them degrade in performance simply
because overhead of communication for message-
passing operations (e.g. send and receive) drops
from a few microseconds on a tightly-coupled
parallel system to a few milliseconds on a (loosely-
coupled) workstation cluster or grid. In this case,
therefore, we recommend to implement a coarse-
grain Domain Decomposition approach, i.e. to
dynamically partition the overall computational
domain into sub-domains (each consisting of as
many parallel processes, volumes, finite elements,
as possible), such that each sub-domain com-
pletely fits onto the available processors of the
corresponding parallel system in the grid. Thus,
only moderate performance degradation from the
reduced number of inter-system communication
can be expected. A prerequisite for this to work
successfully is that the subset of selected parallel
systems is of homogeneous nature, i.e. architecture
and operating system of these parallel systems
should be identical. One Grid infrastructure which
offers this feature is the Distributed European
Infrastructure for Supercomputing Applications
(DEISA, 2010), which (among others) provides
a homogeneous cluster of parallel AIX machines
distributed over several of the 11 European su-
percomputing centers which are part of DEISA
(see also Section 5 in this Chapter).
Moderately Parallel Applications . These
applications, which have been parallelized in the
past, often using Message Passing MPI library
functions for the inter-process communication on
workstation clusters or on small parallel systems,
are well-suited for parallel systems with perhaps
a few dozen to a few hundreds of processors,
but they won't scale easily to a large number of
parallel processes (and processors). Reasons are a
significant scalar portion of the code which can't
run in parallel and/or the relatively high ratio of
inter-process communication to computation,
resulting in relatively high idle times of the CPUs
waiting fore the data. Many commercial codes
fall in this category, for example finite-element
codes such as Abaqus, Nastran, or Pamcrash.
Here we recommend to check if the main goal is
to analyze many similar scenarios with one and
the same code but on different data sets, and run
as many codes in parallel as possible, on as many
moderately parallel sub-systems as possible (this
could be virtualized sub-systems on one large
supercomputer, for example).
Explicit versus Implicit Algorithms. Dis-
crete Analogues of systems of partial differential
equations, stemming from numerical methods
such as finite difference, finite volume, or finite
element discretizations, often result in large sets
of explicit or implicit algebraic equations for the
unknown discrete variables (e.g. velocity vectors,
pressure, temperature). The explicit methods
are usually slower (in convergence to the exact
solution vector of the algebraic system) than the
implicit ones but they are also inherently parallel,
because there is no dependence of the solution
variables among each other, and therefore there
are no recursive algorithms. In case of the more
accurate implicit methods, however, solution
variables are highly inter-dependent leading to
recursive sparse-matrix systems of algebraic equa-
tions which cannot easily split (parallelized) into
smaller systems. Again, here, we recommend to
introduce a Domain Decomposition approach as
described in the above section on Highly Parallel
Search WWH ::




Custom Search