Geoscience Reference
In-Depth Information
Parallel computing is often felt to be the preserve of large number crunching engineering dis-
ciplines. Geography, however, has many large and complex problems that require the use of the
largest supercomputers available, for example, climate modelling. GeoComputation (GC) has arisen
out of the direct need for solving spatial problems with advanced computational methods. However,
geography also has many smaller problems that can still benefit from parallelism, where the cur-
rent technology is easily available to geographers. This chapter will provide an overview of paral-
lel computing, starting with the main types of parallel computing available and a short history of
developments in this field. We will then discuss whether geography even needs data and/or task
parallel computing and when it should and should not be used. Since the irst edition of the topic was
published, more applications of parallel computing have appeared in the literature, so the current
state of play is summarised. Finally, we present a recent example of how parallel computing can be
applied to geodemographic classification and end with reflections on the future.
3.2 TYPES OF PARALLEL COMPUTING
Different types of parallel computer are suited to different types of task. Flynn (1972) proposed a
classification of computers based on their use of instructions (the program) and their use of data. He
divided computers into four possible groups formed by the intersection of machines that used single
streams of data and multiple streams of data and machines that used single streams of instruc-
tions and multiple streams of instructions. First, a classic CPU is a SISD (single instruction single
data) processor that performs one instruction at a time on a single item of data. The operations are
sequenced in time and are easily traced and understood. Some would argue that the introduction of
pipelining in modern processors introduces an element of temporal parallelism into the processing,
although this is not true parallelism as it is not completely within the control of the programmer.
Second, a MISD (multiple instruction single data) processor could apply multiple instructions
to a single item of data at the same time. This is clearly of no use in the real world and therefore is
seen by many to be a serious failing of Flynn's classification since it classifies non-existent proces-
sor types. Third, SIMD (single instruction multiple data) machines have a series of processors that
operate in exact lockstep, each carrying out the same operation on a different piece of data at the
same time. The experience of the 1980s was that such machines proved to be less than useful for
many types of real problem.
Finally, there are MIMD (multiple instruction multiple data) machines, which have proven to be
more useful, having many processors performing different instructions on different pieces of data
at the same time, executing both data and task forms of parallelism. They do not require each pro-
cessor to be exactly in step with each other processor or even to be carrying out a similar task. This
allows the programmer much greater flexibility in programming the machine to carry out the task
required as opposed to coercing the algorithm to fit the machine.
Parallel computers can also be classified by the degree to which the hardware supports paral-
lelism, that is, multicore and multiprocessor computers have multiple processing units embedded
in a single machine with shared memory. Distributed computing (i.e. clusters, grids and massively
parallel processing), on the other hand, uses multiple computers connected by a network in order to
do the same task. Each processor can only access the memory in its own unit. Thus, if a data value
is required by another processor, an explicit message must be sent to request it and another to return
the value required. Well-known examples of distributed computing over the Internet are SETI@
home (http://setiathome.berkeley.edu), Folding@home (http://folding.standford.edu) and climate
change modelling (http://www.climateprediction.net/). A cluster is a group of stand-alone machines
that are connected via a network where a Beowulf cluster, which consists of many computers con-
nected via a local area network (Sterling et al. 1995), is now used worldwide and is one of the most
common types available. The top supercomputer in the world as of June 2013, the Chinese Tianhe-2,
is a cluster with more than 3 million Intel processors and a performance of 33,862 TFLOPS (http://
www.top500.org/), where a FLOPS is a floating point operation per second. The HTCondor project
Search WWH ::




Custom Search