A Glimpse of Parallel Computing - Elements of Scientific Computing

Information Technology Reference

In-Depth Information

Example 10.1. If P is 6 and n is 63, then we have

k D 63

6

D 10

j n

P

and

mod .n; P / D mod .63; 6/ D 3:

This means that processors with id p D 0; 1; 2 will be assigned with n p

D 11 ,

whereas the other processors get n p D 10 .

Finding the number of x values for each processor, e.g., by using ( 10.4 ), does not

complete the work division yet. The next information is about which x values should

be assigned to each processor. There can be many different solutions to this parti-

tioning problem, and its impact on the resulting parallel performance is normally

larger for distributed-memory systems than for the shared-memory counterparts.

Often, letting each processor handle a contiguous piece of memory is performance

friendly. In this spirit, the index set f0;1;:::;n 1g corresponding to the array

entries can be segmented into P pieces, where the start position for processor p is

i start ;p D p j n

P

k C min .p; mod .n; P //:

(10.5)

Example 10.2. Following the above example, where P

D

6 and n

D

63 , we can

use ( 10.5 )tofind

i start ;0 D 0 63

6

C min .0; mod .63; 6// D 0;

i start ;1 D 1 63

6

C min .1; mod .63; 6// D 11;

i start ;2 D 2 63

6

C min .2; mod .63; 6// D 22;

i start ;3 D 3 63

6

C min .3; mod .63; 6// D 33;

i start ;4 D 4 63

6

C min .4; mod .63; 6// D 43;

i start ;5 D 5 63

6

C min .5; mod .63; 6// D 53:

When the work division is ready, i.e., n p and i start ;p are computed, the parallelized

computation can be implemented simply as follows:

for (i=i_start_p; i<i_start_p+n_p; i++)

y[i] = f(x[i]);

The above code segment implicitly assumes that all processors can access the

entire x and y arrays in memory, although each processor is only assigned to work

Elements of Scientific Computing

Search WWH ::

Custom Search

Home