Database Reference
In-Depth Information
25
20.7
20
15
10
5.2
5
3.5
2.1
0
2
4
Number of Processors
8
16
Figure 8.11
Scalability of processing task-parallel jobs in ProRata.
validation of results. Next we describe how ProRata addresses the problem of
eciency in data processing and the problem of data noise.
8.6.1.1
Parallel Processing of Core Analysis Steps in ProRata
The number of files that are typically generated by mass spectrometry devices
for a whole proteome experiment easily reaches several thousands. Although
each individual file is relatively small in size, processing them all collectively
is time-consuming. Since initial processing of individual files does not depend
on the other files, the
R
version of ProRata allows one to employ the task-
parallelism feature of
pR
for concurrent processing of all these files on multiple
processors. Algorithm 8.1 depicts the fragment of the ProRata code with the
slight changes that were introduced to the serial
R
code to enable such a task-
parallel processing. The modifications are via
PE()
highlighted in boldface.
As a result, the linear speed-up has been gained, as shown in Figure 8.11.
Note that a superlinear speed-up has been observed for 16 processors, partly
due to the fact that each processor had its own copy of the file stored on a
local disk (see Algorithm 8.1).
Likewise, one of the key steps in ProRata is the use of principal component
analysis (PCA) (see details in Section 8.6.1.2).
R
supports this kind of anal-
ysis through its
prcomp()
function, which underneath calls a serial singular
value decomposition function, called
svd()
. Since
svd()
calculation is a matrix
calculation, for large matrices, this calculation is computationally demanding.
The parallel and optimized
svd(
) calculation is, however, available through
the ScaLAPACK parallel linear algebra package. To invoke a parallel version
of the
prcomp()
library, one needs to load the
RScaLAPACK
library and
replace the call to
prcomp()
function with the call to
sla.prcomp()
function.
These slight modifications for getting access to the data-parallelism feature of