Parallel Computing in Geography - GeoComputation

Geoscience Reference

In-Depth Information

Parallel computing is often felt to be the preserve of large number crunching engineering dis-

ciplines. Geography, however, has many large and complex problems that require the use of the

largest supercomputers available, for example, climate modelling. GeoComputation (GC) has arisen

out of the direct need for solving spatial problems with advanced computational methods. However,

geography also has many smaller problems that can still benefit from parallelism, where the cur-

rent technology is easily available to geographers. This chapter will provide an overview of paral-

lel computing, starting with the main types of parallel computing available and a short history of

developments in this field. We will then discuss whether geography even needs data and/or task

parallel computing and when it should and should not be used. Since the irst edition of the topic was

published, more applications of parallel computing have appeared in the literature, so the current

state of play is summarised. Finally, we present a recent example of how parallel computing can be

applied to geodemographic classification and end with reflections on the future.

3.2 TYPES OF PARALLEL COMPUTING

Different types of parallel computer are suited to different types of task. Flynn (1972) proposed a

classification of computers based on their use of instructions (the program) and their use of data. He

divided computers into four possible groups formed by the intersection of machines that used single

streams of data and multiple streams of data and machines that used single streams of instruc-

tions and multiple streams of instructions. First, a classic CPU is a SISD (single instruction single

data) processor that performs one instruction at a time on a single item of data. The operations are

sequenced in time and are easily traced and understood. Some would argue that the introduction of

pipelining in modern processors introduces an element of temporal parallelism into the processing,

although this is not true parallelism as it is not completely within the control of the programmer.

Second, a MISD (multiple instruction single data) processor could apply multiple instructions

to a single item of data at the same time. This is clearly of no use in the real world and therefore is

seen by many to be a serious failing of Flynn's classification since it classifies non-existent proces-

sor types. Third, SIMD (single instruction multiple data) machines have a series of processors that

operate in exact lockstep, each carrying out the same operation on a different piece of data at the

same time. The experience of the 1980s was that such machines proved to be less than useful for

many types of real problem.

Finally, there are MIMD (multiple instruction multiple data) machines, which have proven to be

more useful, having many processors performing different instructions on different pieces of data

at the same time, executing both data and task forms of parallelism. They do not require each pro-

cessor to be exactly in step with each other processor or even to be carrying out a similar task. This

allows the programmer much greater flexibility in programming the machine to carry out the task

required as opposed to coercing the algorithm to fit the machine.

Parallel computers can also be classified by the degree to which the hardware supports paral-

lelism, that is, multicore and multiprocessor computers have multiple processing units embedded

in a single machine with shared memory. Distributed computing (i.e. clusters, grids and massively

parallel processing), on the other hand, uses multiple computers connected by a network in order to

do the same task. Each processor can only access the memory in its own unit. Thus, if a data value

is required by another processor, an explicit message must be sent to request it and another to return

the value required. Well-known examples of distributed computing over the Internet are SETI@

home (http://setiathome.berkeley.edu), Folding@home (http://folding.standford.edu) and climate

change modelling (http://www.climateprediction.net/). A cluster is a group of stand-alone machines

that are connected via a network where a Beowulf cluster, which consists of many computers con-

nected via a local area network (Sterling et al. 1995), is now used worldwide and is one of the most

common types available. The top supercomputer in the world as of June 2013, the Chinese Tianhe-2,

is a cluster with more than 3 million Intel processors and a performance of 33,862 TFLOPS (http://

www.top500.org/), where a FLOPS is a floating point operation per second. The HTCondor project

GeoComputation

Search WWH ::

Custom Search

Home