Geoscience Reference
In-Depth Information
has been used in high-performance computing (HPC) for many years, parallel computing has
more recently become embedded in desktops through the development of hyper-threading tech-
nology, producing virtual cores and multicore processors, that is, a single component containing
two or more independent CPUs (called cores ), which can read and execute program instructions
(Rauber and RĂ¼nger 2010).
Parallel computing should not be confused with so-called multitasking where a single proces-
sor gives the appearance of working on more than one task (or process) at the same time by split-
ting its resources between competing programs. However, only one process is actually running on
the processor at any one point in time, meaning that the processor is only actively executing spe-
cific instructions for that particular task. Thus, multitasking schedules tasks so as to minimise the
amount of time that the processor is left idle while it waits for slower peripheral activities to occur.
If both programs involved are computationally intensive, then scheduling and waiting overheads
could mean that it may require more than twice as long for both of them to complete.
Some tasks are easy to parallelise. One example often used is building a brick wall. If it would
take one person 4 days to build, it would probably take a well-organised team of four bricklayers
1 day to build it. In terms of parallel computing, this is called speed-up. Speed-up is a term which
is often used to describe how well (or badly) a parallel program is working; it can be defined as the
time taken to run the program on a single processor divided by the time taken to run it on a larger
number of processors ( N ). The closer the speed-up is to N , the better the program is performing.
An optimal speed-up would be a decrease in runtime that is linearly proportional to the number of
processors, but this is rarely achieved, that is, a small number of processors usually result in almost
linear speed-up but then saturate for large numbers of processing units. In addition, it is also impor-
tant to note that those algorithms used within a parallelised environment may themselves be opti-
mised for these purposes; so, in addition to basic speed-up as a result of running jobs over multiple
processors, the optimised algorithms for these different computational contexts may themselves
provide enhanced performance.
The maximum amount of speed-up, s , as a result of parallelisation, which is referred to as
Amdahl's law (Amdahl 1967), is inversely proportional to the amount of time that is spent running
those sections of the code that can only run sequentially. For example, if the sequential part of the
code represents 10% of the runtime, then the upper limit to speed up is 10 times the non-parallel
version, regardless of how many more processors are added. Understanding the critical path or the
order in which dependent calculations must be undertaken is necessary for implementing parallel
algorithms. If there are no dependencies between the calculations, then all the tasks (often called
threads, fibres or processes depending upon the size) can be run in parallel. Moreover, not all paral-
lelisation efforts will result in decreased runtime. When a task is divided into an increasing number
of threads, these threads will spend more time communicating with each other. The overheads from
communication will eventually dominate the time spent solving the problem, and further efforts at
parallelisation will simply increase the amount of time needed to complete the task. At the same
time, there are also some tasks that are clearly unsuited to parallelisation. For example, if it takes
a woman 9 months to have a baby, then adding more women will not decrease the overall time it
takes to have a single baby!
There are different levels of parallelism possible, that is, bit-level, instruction-level, data-level
and task-level parallelism (Culler et al. 1999). In the past, speed-up has been achieved through bit-
level parallelism, that is, by doubling the amount of information that a CPU can process, as 4-bit
microprocessors have been replaced by 8-bit, then 16-bit, then 32-bit and, most recently, 64-bit
microprocessors, which are now commonplace. Instruction-level parallelism involves grouping or
ordering sets of instructions so that they can be executed in parallel and was the main form of
parallelism in the 1980s and 1990s. Data parallelism involves taking the repetitive tasks that occur
in loops and giving these to different processing units, while task parallelism involves giving dif-
ferent tasks to different processing units, where the former scales well with the size of the problem
but the latter does not.
Search WWH ::




Custom Search