Information Technology Reference
In-Depth Information
While SIMD can achieve the same result as SPMD, SIMD systems typically
execute in lock step with a central controlling authority for program exe-
cution. As can be seen, when multiple instances of the map function are
executed in parallel, they work on different data streams using the same
map function. In essence, though the underlying hardware can be a MIMD
machine (a compute cluster), the MapReduce platform follows a SPMD
model to reduce programming effort. Of course, while this holds for simple
use cases, a complex application may involve multiple phases, each of which
is solved with MapReduce—in which case the platform will be a combina-
tion of SPMD and MIMD.
17.1.1.2 Data Parallelism versus Task Parallelism
Data parallelism is a way of performing parallel execution of an applica-
tion on multiple processors. It focuses on distributing data across different
nodes in the parallel execution environment and enabling simultaneous sub-
computations on these distributed data across the different compute nodes.
This is typically achieved in SIMD mode (Single Instruction Multiple Data
mode) and can either have a single controller controlling the parallel data
operations or multiple threads working in the same way on the individual
compute nodes (SPMD). In contrast, task parallelism focuses on distributing
parallel execution threads across parallel computing nodes. These threads
may execute the same or different threads. These threads exchange mes-
sages either through shared memory or explicit communication messages,
as per the parallel algorithm. In the most general case, each of the threads of
a Task Parallel system can be doing completely different tasks but coordinat-
ing to solve a specific problem. In the most simplistic case, all threads can
be executing the same program and differentiating based on their node IDs
to perform any variation in task responsibility. Most common Task Parallel
algorithms follow the master-worker model, where there is a single mas-
ter and multiple workers. The master distributes the computation to differ-
ent workers based on scheduling rules and other task-allocation strategies.
MapReduce falls under the category of data-parallel SPMD architectures.
Due to the functional programming paradigm used, the individual map-
per processes processing the split data are not aware (or dependent) upon
the results of the other mapper processes. Also, since the order of execution
of the mapper function does not matter, one can reorder or parallelize the
execution. Thus, this inherent parallelism enables the mapper function to
scale and execute on multiple nodes in parallel. Along the same lines, the
reduce functions also run in parallel; each instance works on a different
output key. All the values are processed independently, again facilitating
implicit data parallelism. The extent of parallel execution is determined by
the number of map and reduce tasks that are configured at the time of job
submission.
Search WWH ::




Custom Search