Information Technology Reference
In-Depth Information
forms of parallelism lies in the size and type of the tasks. It is also worth noticing
that some people restrict the concept of data parallelism to loop-level parallelism,
but we in this chapter we will refer data parallelism more broadly as parallelism
that arises from dividing some global data structure and the associated computing
operations. Unlike in the case of task parallelism, where the tasks can be large and
independent of each other, the different pieces in data parallelism are logically con-
nected through an underlying global data structure. Work division related to data
parallelism, which is the first step of transforming a serial task into its parallel coun-
terpart, can be non-trivial. So will be the interaction between the processors. Our
focus is therefore on data parallelism in the following text.
10.2.3
Example 1 of Data Parallelism
Let us consider a simple example of evaluating a uni-variable function f.x/ for a
set of x values. In a typical computer program the x values and evaluation results
are stored in one-dimensional arrays, and the computational work is implemented
as the following for -loop in C/C++ syntax:
for (i=0; i<n; i++)
y[i] = f(x[i]);
An important observation is that the evaluations of f for different x values are
independent of each other. If each function evaluation f.x/ is considered as a sep-
arate task, this example can be categorized as task parallel. However, it is more
common to consider the present example as data parallel, because the same func-
tion is applied to an array of numerical values. Parallelism arises from dividing
the x values into subsets, and one subset is given to one processor. Suppose all
the evaluations of f are equally expensive and all processors are equally powerful.
Fairness thus requires that each processor get the same number of x values. (For
cases of different costs of the evaluations and inhomogeneous processors, we refer
to Exercises 10.2 and 10.3 .)
Let P denote the number of processors and n the number of x values; then each
processor should take n=P values. In case n is not divisible by P , the following
formula provides a fair work division:
k C 1
n p D j n
P
if p< mod .n; P /;
(10.4)
0
else ;
where p denotes the processor id, which by convention starts from 0 and stops at
P 1 .In( 10.4 ) the symbol bc denotes the floor function that gives the same result
as integer division in a computer language, and mod .n; P / denotes the remainder
of the integer division n=P . It is clear that the maximum difference between the
different n p values is one, i.e., the fairest possible partitioning.
Search WWH ::




Custom Search