Cloudware Application Development - Guide to Cloud Computing for Business and Technology Managers

Information Technology Reference

In-Depth Information

While SIMD can achieve the same result as SPMD, SIMD systems typically

execute in lock step with a central controlling authority for program exe-

cution. As can be seen, when multiple instances of the map function are

executed in parallel, they work on different data streams using the same

map function. In essence, though the underlying hardware can be a MIMD

machine (a compute cluster), the MapReduce platform follows a SPMD

model to reduce programming effort. Of course, while this holds for simple

use cases, a complex application may involve multiple phases, each of which

is solved with MapReduce—in which case the platform will be a combina-

tion of SPMD and MIMD.

17.1.1.2 Data Parallelism versus Task Parallelism

Data parallelism is a way of performing parallel execution of an applica-

tion on multiple processors. It focuses on distributing data across different

nodes in the parallel execution environment and enabling simultaneous sub-

computations on these distributed data across the different compute nodes.

This is typically achieved in SIMD mode (Single Instruction Multiple Data

mode) and can either have a single controller controlling the parallel data

operations or multiple threads working in the same way on the individual

compute nodes (SPMD). In contrast, task parallelism focuses on distributing

parallel execution threads across parallel computing nodes. These threads

may execute the same or different threads. These threads exchange mes-

sages either through shared memory or explicit communication messages,

as per the parallel algorithm. In the most general case, each of the threads of

a Task Parallel system can be doing completely different tasks but coordinat-

ing to solve a specific problem. In the most simplistic case, all threads can

be executing the same program and differentiating based on their node IDs

to perform any variation in task responsibility. Most common Task Parallel

algorithms follow the master-worker model, where there is a single mas-

ter and multiple workers. The master distributes the computation to differ-

ent workers based on scheduling rules and other task-allocation strategies.

MapReduce falls under the category of data-parallel SPMD architectures.

Due to the functional programming paradigm used, the individual map-

per processes processing the split data are not aware (or dependent) upon

the results of the other mapper processes. Also, since the order of execution

of the mapper function does not matter, one can reorder or parallelize the

execution. Thus, this inherent parallelism enables the mapper function to

scale and execute on multiple nodes in parallel. Along the same lines, the

reduce functions also run in parallel; each instance works on a different

output key. All the values are processed independently, again facilitating

implicit data parallelism. The extent of parallel execution is determined by

the number of map and reduce tasks that are configured at the time of job

submission.

Search WWH ::

Custom Search

Home