Porting HPC Applications to Grids and Clouds - Cloud, Grid and High Performance Computing: Emerging Applications

Information Technology Reference

In-Depth Information

becomes a dominant factor as processor number

increases, leading to a loss in application scal-

ability with growing number of processors. Gus-

tafson (1988) proved that this holds only for fixed

problem size, and that in practice, with increasing

number of processors, the user increases problem

size as well, always trying to solve the largest

possible problem on any given number of CPUs.

Gustafson demonstrated this on a 1028-proces-

sor parallel system, for several applications. For

example, he was able to achieve a speed-up factor

of over 1000 for a Computational Fluid Dynam-

ics application with 1028 parallel processes on

the 1028-processor system. Porting these highly

parallel applications to a grid, however, has shown

that many of them degrade in performance simply

because overhead of communication for message-

passing operations (e.g. send and receive) drops

from a few microseconds on a tightly-coupled

parallel system to a few milliseconds on a (loosely-

coupled) workstation cluster or grid. In this case,

therefore, we recommend to implement a coarse-

grain Domain Decomposition approach, i.e. to

dynamically partition the overall computational

domain into sub-domains (each consisting of as

many parallel processes, volumes, finite elements,

as possible), such that each sub-domain com-

pletely fits onto the available processors of the

corresponding parallel system in the grid. Thus,

only moderate performance degradation from the

reduced number of inter-system communication

can be expected. A prerequisite for this to work

successfully is that the subset of selected parallel

systems is of homogeneous nature, i.e. architecture

and operating system of these parallel systems

should be identical. One Grid infrastructure which

offers this feature is the Distributed European

Infrastructure for Supercomputing Applications

(DEISA, 2010), which (among others) provides

a homogeneous cluster of parallel AIX machines

distributed over several of the 11 European su-

percomputing centers which are part of DEISA

(see also Section 5 in this Chapter).

Moderately Parallel Applications . These

applications, which have been parallelized in the

past, often using Message Passing MPI library

functions for the inter-process communication on

workstation clusters or on small parallel systems,

are well-suited for parallel systems with perhaps

a few dozen to a few hundreds of processors,

but they won't scale easily to a large number of

parallel processes (and processors). Reasons are a

significant scalar portion of the code which can't

run in parallel and/or the relatively high ratio of

inter-process communication to computation,

resulting in relatively high idle times of the CPUs

waiting fore the data. Many commercial codes

fall in this category, for example finite-element

codes such as Abaqus, Nastran, or Pamcrash.

Here we recommend to check if the main goal is

to analyze many similar scenarios with one and

the same code but on different data sets, and run

as many codes in parallel as possible, on as many

moderately parallel sub-systems as possible (this

could be virtualized sub-systems on one large

supercomputer, for example).

Explicit versus Implicit Algorithms. Dis-

crete Analogues of systems of partial differential

equations, stemming from numerical methods

such as finite difference, finite volume, or finite

element discretizations, often result in large sets

of explicit or implicit algebraic equations for the

unknown discrete variables (e.g. velocity vectors,

pressure, temperature). The explicit methods

are usually slower (in convergence to the exact

solution vector of the algebraic system) than the

implicit ones but they are also inherently parallel,

because there is no dependence of the solution

variables among each other, and therefore there

are no recursive algorithms. In case of the more

accurate implicit methods, however, solution

variables are highly inter-dependent leading to

recursive sparse-matrix systems of algebraic equa-

tions which cannot easily split (parallelized) into

smaller systems. Again, here, we recommend to

introduce a Domain Decomposition approach as

described in the above section on Highly Parallel

Cloud, Grid and High Performance Computing: Emerging Applications

Search WWH ::

Custom Search

Home