Database Reference
In-Depth Information
result, parallel cannot determine the number of cores and will default to using one
CPU core. When you receive this warning message, you can do one of the following
four things:
• Don't worry, and be happy with using one CPU core per machine.
• Specify the number of jobs per machine via the -j option.
• Specify the number of cores to use per machine by putting, for example, 2/ if you
want two cores, in front of each hostname in the instances file.
• Install GNU Parallel using a package manager (not that this is usually not the lat‐
est version). For example, on Ubuntu:
$ parallel --nonall --slf instances "sudo apt-get install -y parallel"
Distributing Local Data Among Remote Machines
The second flavor of distributed processing is to distribute local data directly among
remote machines. Imagine you have one very large data set that you want to process
using multiple remote machines. For simplicity, we're going to sum all integers from 1
to 1,000. First, let's verify that our input is actually being distributed by printing the
hostname of the remote machine and the length of the input it received using wc :
$ seq 1000 | parallel -N100 --pipe --slf hosts "(hostname; wc -l) | paste -sd:"
ip-172-31-23-204:100
ip-172-31-23-205:100
ip-172-31-23-205:100
ip-172-31-23-204:100
ip-172-31-23-205:100
ip-172-31-23-204:100
ip-172-31-23-205:100
ip-172-31-23-204:100
ip-172-31-23-205:100
ip-172-31-23-204:100
We can verify that our 1,000 numbers get distributed evenly in subsets of 100 (as
specified by -N100 ). Now, we're ready to sum all those numbers:
$ seq 1000 | parallel -N100 --pipe --slf hosts "paste -sd+ | bc" |
> paste -sd+ | bc
500500
Here, we immediately also sum the 10 sums we get back from the remote machines.
Let's double check the answer is correct:
$ seq 1000 | paste -sd+ | bc
500500
Search WWH ::




Custom Search