Information Technology Reference
In-Depth Information
Example 10.1.
If
P
is 6 and
n
is 63, then we have
k
D
63
6
D 10
j
n
P
and
mod
.n; P / D
mod
.63; 6/ D 3:
This means that processors with id
p D 0; 1; 2
will be assigned with
n
p
D 11
,
whereas the other processors get
n
p
D 10
.
Finding the number of
x
values for each processor, e.g., by using (
10.4
), does not
complete the work division yet. The next information is about which
x
values should
be assigned to each processor. There can be many different solutions to this parti-
tioning problem, and its impact on the resulting parallel performance is normally
larger for distributed-memory systems than for the shared-memory counterparts.
Often, letting each processor handle a contiguous piece of memory is performance
friendly. In this spirit, the index set
f0;1;:::;n 1g
corresponding to the array
entries can be segmented into
P
pieces, where the start position for processor
p
is
i
start
;p
D p
j
n
P
k
C
min
.p;
mod
.n; P //:
(10.5)
Example 10.2.
Following the above example, where
P
D
6
and
n
D
63
, we can
use (
10.5
)tofind
i
start
;0
D 0
63
6
C
min
.0;
mod
.63; 6// D 0;
i
start
;1
D 1
63
6
C
min
.1;
mod
.63; 6// D 11;
i
start
;2
D 2
63
6
C
min
.2;
mod
.63; 6// D 22;
i
start
;3
D 3
63
6
C
min
.3;
mod
.63; 6// D 33;
i
start
;4
D 4
63
6
C
min
.4;
mod
.63; 6// D 43;
i
start
;5
D 5
63
6
C
min
.5;
mod
.63; 6// D 53:
When the work division is ready, i.e.,
n
p
and
i
start
;p
are computed, the parallelized
computation can be implemented simply as follows:
for (i=i_start_p; i<i_start_p+n_p; i++)
y[i] = f(x[i]);
The above code segment implicitly assumes that all processors can access the
entire
x
and
y
arrays in memory, although each processor is only assigned to work