Parallel Computing in Geography - GeoComputation

Geoscience Reference

In-Depth Information

has been used in high-performance computing (HPC) for many years, parallel computing has

more recently become embedded in desktops through the development of hyper-threading tech-

nology, producing virtual cores and multicore processors, that is, a single component containing

two or more independent CPUs (called cores ), which can read and execute program instructions

(Rauber and Rünger 2010).

Parallel computing should not be confused with so-called multitasking where a single proces-

sor gives the appearance of working on more than one task (or process) at the same time by split-

ting its resources between competing programs. However, only one process is actually running on

the processor at any one point in time, meaning that the processor is only actively executing spe-

cific instructions for that particular task. Thus, multitasking schedules tasks so as to minimise the

amount of time that the processor is left idle while it waits for slower peripheral activities to occur.

If both programs involved are computationally intensive, then scheduling and waiting overheads

could mean that it may require more than twice as long for both of them to complete.

Some tasks are easy to parallelise. One example often used is building a brick wall. If it would

take one person 4 days to build, it would probably take a well-organised team of four bricklayers

1 day to build it. In terms of parallel computing, this is called speed-up. Speed-up is a term which

is often used to describe how well (or badly) a parallel program is working; it can be defined as the

time taken to run the program on a single processor divided by the time taken to run it on a larger

number of processors ( N ). The closer the speed-up is to N , the better the program is performing.

An optimal speed-up would be a decrease in runtime that is linearly proportional to the number of

processors, but this is rarely achieved, that is, a small number of processors usually result in almost

linear speed-up but then saturate for large numbers of processing units. In addition, it is also impor-

tant to note that those algorithms used within a parallelised environment may themselves be opti-

mised for these purposes; so, in addition to basic speed-up as a result of running jobs over multiple

processors, the optimised algorithms for these different computational contexts may themselves

provide enhanced performance.

The maximum amount of speed-up, s , as a result of parallelisation, which is referred to as

Amdahl's law (Amdahl 1967), is inversely proportional to the amount of time that is spent running

those sections of the code that can only run sequentially. For example, if the sequential part of the

code represents 10% of the runtime, then the upper limit to speed up is 10 times the non-parallel

version, regardless of how many more processors are added. Understanding the critical path or the

order in which dependent calculations must be undertaken is necessary for implementing parallel

algorithms. If there are no dependencies between the calculations, then all the tasks (often called

threads, fibres or processes depending upon the size) can be run in parallel. Moreover, not all paral-

lelisation efforts will result in decreased runtime. When a task is divided into an increasing number

of threads, these threads will spend more time communicating with each other. The overheads from

communication will eventually dominate the time spent solving the problem, and further efforts at

parallelisation will simply increase the amount of time needed to complete the task. At the same

time, there are also some tasks that are clearly unsuited to parallelisation. For example, if it takes

a woman 9 months to have a baby, then adding more women will not decrease the overall time it

takes to have a single baby!

There are different levels of parallelism possible, that is, bit-level, instruction-level, data-level

and task-level parallelism (Culler et al. 1999). In the past, speed-up has been achieved through bit-

level parallelism, that is, by doubling the amount of information that a CPU can process, as 4-bit

microprocessors have been replaced by 8-bit, then 16-bit, then 32-bit and, most recently, 64-bit

microprocessors, which are now commonplace. Instruction-level parallelism involves grouping or

ordering sets of instructions so that they can be executed in parallel and was the main form of

parallelism in the 1980s and 1990s. Data parallelism involves taking the repetitive tasks that occur

in loops and giving these to different processing units, while task parallelism involves giving dif-

ferent tasks to different processing units, where the former scales well with the size of the problem

but the latter does not.

GeoComputation

Search WWH ::

Custom Search

Home