Database Reference
In-Depth Information
Processing data/movies.csv
Processing data/top250.csv
Here's the same example, but now using
parallel
:
$
find data -name
'*.csv'
-print0 | parallel -0
echo
"Processing {}"
Processing data/countries.csv
Processing data/movies.csv
Processing data/top250.csv
The
-print0
option allows filenames that contain newlines or other types of white‐
space to be correctly interpreted by programs that process the output of
find
. If you
are absolutely certain that the filenames contain no special characters such as spaces
and newlines, then you can omit the
-print0
and
-0
options.
If the list to process becomes too complex, you can always store the
result to a temporary file and then use the method to loop over
lines from a file.
Parallel Processing
Assume that we have a very long-running command, such as the one shown in
Example 8-1
.
Example 8-1. ~/book/ch08/slow.sh
#!/bin/bash
echo
"Starting job $1"
duration
=
$((
1
+
RANDOM%5
))
sleep
$duration
echo
"Job $1 took ${duration} seconds"
$RANDOM
is an internal Bash function that returns a pseudorandom integer
between 0 and 32,767. Taking the remainder of the division of that number by 5
and adding 1 ensures that the number is between 1 and 5.
This process does not take up all the resources we have available. And it so happens
that we need to run this command a lot of times. For example, we need to download a
long sequence of files.
A naive way to parallelize is to run the commands in the background:
$
for
i in
{
1..4
}
;
do
>
(
./slow.sh
$i
;
echo
Processed
$i
)
&
>
done
[1] 3334
[2] 3335