Database Reference
In-Depth Information
Processing data/movies.csv
Processing data/top250.csv
Here's the same example, but now using parallel :
$ find data -name '*.csv' -print0 | parallel -0 echo "Processing {}"
Processing data/countries.csv
Processing data/movies.csv
Processing data/top250.csv
The -print0 option allows filenames that contain newlines or other types of white‐
space to be correctly interpreted by programs that process the output of find . If you
are absolutely certain that the filenames contain no special characters such as spaces
and newlines, then you can omit the -print0 and -0 options.
If the list to process becomes too complex, you can always store the
result to a temporary file and then use the method to loop over
lines from a file.
Parallel Processing
Assume that we have a very long-running command, such as the one shown in
Example 8-1 .
Example 8-1. ~/book/ch08/slow.sh
#!/bin/bash
echo "Starting job $1"
duration = $(( 1 + RANDOM%5 ))
sleep $duration
echo "Job $1 took ${duration} seconds"
$RANDOM is an internal Bash function that returns a pseudorandom integer
between 0 and 32,767. Taking the remainder of the division of that number by 5
and adding 1 ensures that the number is between 1 and 5.
This process does not take up all the resources we have available. And it so happens
that we need to run this command a lot of times. For example, we need to download a
long sequence of files.
A naive way to parallelize is to run the commands in the background:
$ for i in { 1..4 } ; do
> ( ./slow.sh $i ; echo Processed $i ) &
> done
[1] 3334
[2] 3335
Search WWH ::




Custom Search