Information Technology Reference
In-Depth Information
Example 10.1. If P is 6 and n is 63, then we have
k D 63
6
D 10
j n
P
and
mod .n; P / D mod .63; 6/ D 3:
This means that processors with id p D 0; 1; 2 will be assigned with n p
D 11 ,
whereas the other processors get n p D 10 .
Finding the number of x values for each processor, e.g., by using ( 10.4 ), does not
complete the work division yet. The next information is about which x values should
be assigned to each processor. There can be many different solutions to this parti-
tioning problem, and its impact on the resulting parallel performance is normally
larger for distributed-memory systems than for the shared-memory counterparts.
Often, letting each processor handle a contiguous piece of memory is performance
friendly. In this spirit, the index set f0;1;:::;n 1g corresponding to the array
entries can be segmented into P pieces, where the start position for processor p is
i start ;p D p j n
P
k C min .p; mod .n; P //:
(10.5)
Example 10.2. Following the above example, where P
D
6 and n
D
63 , we can
use ( 10.5 )tofind
i start ;0 D 0 63
6
C min .0; mod .63; 6// D 0;
i start ;1 D 1 63
6
C min .1; mod .63; 6// D 11;
i start ;2 D 2 63
6
C min .2; mod .63; 6// D 22;
i start ;3 D 3 63
6
C min .3; mod .63; 6// D 33;
i start ;4 D 4 63
6
C min .4; mod .63; 6// D 43;
i start ;5 D 5 63
6
C min .5; mod .63; 6// D 53:
When the work division is ready, i.e., n p and i start ;p are computed, the parallelized
computation can be implemented simply as follows:
for (i=i_start_p; i<i_start_p+n_p; i++)
y[i] = f(x[i]);
The above code segment implicitly assumes that all processors can access the
entire x and y arrays in memory, although each processor is only assigned to work
Search WWH ::




Custom Search