Biomedical Engineering Reference
In-Depth Information
For schema B,
n
n
,
K 1
K 2
(i
+
j ( 2 p))
( j ( 2 p)
+
2 p
i
+
1 )
c B (i)
=
+
(11.6)
k
1
k
1
j
=
0
j
=
0
where
K
n
(i
+
2 Kp)
k
1,
K 1 =
K
1
otherwise,
K
(n
( 2 Kp
+
2 p
i
+
1 ))
k
1,
K 2 =
K
1
otherwise,
n
2 p
.
K
=
In Eq. (11.6), the first summation is for forward distribution and the second for back-
ward distribution in each round. The workload imbalance is defined as the difference
between the amount of work on the processor with maximum workload and on the
processor with the minimum workload. The workload imbalance was calculated as a
function of the total number of features for 2, 4, 8, 16, 32, and 64 processors using
schemata A and B. The results of this calculation are shown in Figure 11.5. In the
figure, the plain lines represent the work imbalance calculated using schema A. The
lines with dots represent the work imbalance calculated using schema B. There is a
several orders of magnitude difference in the work imbalance between the schemata.
Using schema B, the imbalance increases with the number of processors. This simply
occurs because the correction that is introduced by reversing the direction of work
assignment is much smaller when there are many processors than when there are only
a few. As the differences between schemata B and C are small compared with the
differences between A and B, only schemata A and B are shown. For simplicity, we
selected schema B for implementation.
We implemented a parallel version of the σ -classifier using schema B and tested
this implementation with a lymphoma data set [26] consisting of 30 samples with
2303 genes. The measured run times and efficiencies for 1, 2, 4, 8, 16, and 32 proces-
sors are shown in Table 11.3 with k
2303. The schema B partitioning
worked well as the implementation scaled to 32 processors with a high efficiency.
The efficiency exceeds 100% due to slight differences between the parallel and serial
versions of the software. The serial version was designed to allow the user a selec-
tion of several options. The parallel version was optimized to run full searches only;
therefore, several conditional statements were removed from the main loop of the par-
allel version leading to a significant performance improvement. The parallel version
does not run on a single processor. The adjusted efficiency shown in the table is the
efficiency calculated assuming a serial time of double the two-processor time. When
compared with the two-processor timing, the efficiency remains at over 95% even for
32 processors.
=
3 and n
=
Search WWH ::




Custom Search