Database Reference
In-Depth Information
Good, that works. If you have a larger command that you want to execute on the
remote machines, you can also put it in a separate script and upload it with parallel .
In our case, let's create a simple command-line tool called sum :
#!/usr/bin/env bash
paste -sd+ | bc
Don't forget to make it executable as discussed in Chapter 4 . The following command
first uploads the file sum :
$ seq 1000 | parallel -N100 --basefile sum --pipe --slf instances './sum' |
> ./sum
500500
Of course, summing 1,000 numbers is only a toy example. It would have been much
faster to do this locally. However, we hope it's clear from this toy example that GNU
Parallel can be incredibly powerful.
Processing Files on Remote Machines
The third flavor of distributed processing is to send files to remote machines, process
them, and retrieve the results. Imagine that we want to count for each borough of
New York City, how often they receive service calls on 311. We don't have that data on
our local machine yet, so let's first obtain it from NYC Open Data using its great API:
$ seq 0 100 900 | parallel "curl -sL 'http://data.cityofnewyork.us/resource'" \
> "'/erm2-nwe9.json?\$limit=100&\$offset={}' | jq -c '.[]' | gzip > {#}.json.gz"
Note that jq -c '.[]' is used to flatten the array of JSON objects so that there's one
line per object. We now have 10 files containing compressed JSON data. Let's see what
one line of JSON looks like:
$ zcat 1.json.gz | head -n 1 | fold
{"school_region":"Unspecified","park_facility_name":"Unspecified","x_coordinate_
state_plane":"945974","agency_name":"Department of Health and Mental Hygiene","u
nique_key":"147","facility_type":"N/A","status":"Assigned","school_address":"Uns
pecified","created_date":"2006-08-29T21:25:23","community_board":"01 STATEN ISLA
ND","incident_zip":"10302","school_name":"Unspecified","location":{"latitude":"4
0.62745427115626","longitude":"-74.13789056665027","needs_recoding":false},"comp
laint_type":"Food Establishment","city":"STATEN ISLAND","park_borough":"STATEN I
SLAND","school_state":"Unspecified","longitude":"-74.13789056665027","intersecti
on_street_1":"DECKER AVENUE","y_coordinate_state_plane":"167905","due_date":"200
6-10-05T21:25:23","latitude":"40.62745427115626","school_code":"Unspecified","sc
hool_city":"Unspecified","address_type":"INTERSECTION","intersection_street_2":"
BARRETT AVENUE","school_number":"Unspecified","resolution_action_updated_date":"
2006-10-06T00:00:17","descriptor":"Handwashing","school_zip":"Unspecified","loca
tion_type":"Restaurant/Bar/Deli/Bakery","agency":"DOHMH","borough":"STATEN ISLAN
D","school_phone_number":"Unspecified"}
If we were to get the total number of service calls per borough on our local machine,
we would run the following command:
Search WWH ::




Custom Search