Database Reference
In-Depth Information
$ seq 5 | parallel "echo Hi {}" >> data/one-big-file.txt
However, GNU Parallel offers the --results option, which stores the output of each
job into a separate file, where the filename is based on the input values:
$ seq 5 | parallel --results data/outdir "echo Hi {}"
Hi 1
Hi 2
Hi 3
Hi 4
Hi 5
$ find data/outdir
data/outdir
data/outdir/1
data/outdir/1/1
data/outdir/1/1/stderr
data/outdir/1/1/stdout
data/outdir/1/3
data/outdir/1/3/stderr
data/outdir/1/3/stdout
data/outdir/1/5
data/outdir/1/5/stderr
data/outdir/1/5/stdout
data/outdir/1/2
data/outdir/1/2/stderr
data/outdir/1/2/stdout
data/outdir/1/4
data/outdir/1/4/stderr
data/outdir/1/4/stdout
When you're running multiple jobs in parallel, the order in which the jobs are run
may not correspond to the order of the input. The output of jobs is therefore also
mixed up. To keep the same order, simply specify the --keep-order or -k option.
Sometimes it's useful to record which input generated which output. GNU Parallel
allows you to tag the output with the --tag option:
$ seq 5 | parallel --tag "echo Hi {}"
1
Hi 1
2
Hi 2
3
Hi 3
4
Hi 4
5
Hi 5
Creating Parallel Tools
The bc tool, which we used in the beginning of the chapter, is not parallel by itself.
However, we can parallelize it using parallel . The Data Science Toolbox contains a
tool called pbc (Janssens, 2014). Its source code is shown in Example 8-2 .
Search WWH ::




Custom Search