Database Reference
In-Depth Information
Example 8-2. Parallel bc (pbc)
#!/usr/bin/env bash
parallel -C, -k -j100% "echo '$1' | bc -l"
This tool allows us to simplify the code used in the beginning of the chapter, too:
$ seq 100 | pbc '{1}^2' | tail
8281
8464
8649
8836
9025
9216
9409
9604
9801
10000
This tool works as follows. You may remember that seq 100 generates integers 1 to
100, one per line. These lines get piped to pbc , which, in turn, feeds them to paral
lel . The argument to {1} is evaluated by parallel before it sends it to bc . This
means that {1} gets replaced by the value of the first column (there is only one col‐
umn) on the line in question.
Distributed Processing
Sometimes you need more power than your local machine, even with all its cores, can
offer. Luckily, GNU Parallel can also leverage the power of remote machines, which
really allows us to speed up our pipeline.
What's great is that GNU Parallel does not have to be installed on the remote
machine. All that's required is that you can connect to the remote machine via SSH,
which is also what GNU Parallel uses to distribute our pipeline. (Having GNU Paral‐
lel installed remotely is helpful because it can then determine how many cores to
employ on each remote machine; more on this later.)
First, we're going to obtain a list of running AWS EC2 instances. Don't worry if you
don't have any remote machines, you can replace any occurrence of --slf instan
ces , which tells GNU Parallel which remote machines to use, with --sshlogin : .
This way, you can still follow along with the examples in this section.
Once we know which remote machines to take over, we're going to consider three fla‐
vors of distributed processing:
Search WWH ::




Custom Search