Introduction - Data Science at the Command Line

Database Reference

In-Depth Information

> "'http://api.nytimes.com/svc/search/v2/articlesearch.json?q=New+York+'" \

> "'Fashion+Week&begin_date={1}0101&end_date={1}1231&page={2}&api-key='" \

> "'<your-api-key>'" ::: { 2009..2013 } ::: { 0..99 } > /dev/null

Computers / CPU cores / Max jobs to run

1:local / 4 / 1

Computer:jobs running/jobs completed/%of started jobs/Average seconds to

complete

local:1/9/100% /0.4s

Basically, we're performing the same query for years 2009-2014. The API only allows

up to 100 pages (starting at 0) per query, so we're generating 100 numbers using brace

expansion. These numbers are used by the page parameter in the query. We're search‐

ing for articles in 2013 that contain the search term New+York+Fashion+Week . Because

the API has certain limits, we ensure that there's only one request at a time, with a

one-second delay between them. Make sure that you replace <your-api-key> with

your own API key for the article search endpoint.

Each request returns 10 articles, so that's 1000 articles in total. These are sorted by

page views, so this should give us a good estimate of the coverage. The results are in

JSON format, which we store in the results directory. The command-line tool tree

(Baker, 2014) gives an overview of how the subdirectories are structured:

$ tree results | head

results

└── 1

├── 2009

│ └── 2

│ ├── 0

│ │ ├── stderr

│ │ └── stdout

│ ├── 1

│ │ ├── stderr

│ │ └── stdout

We can combine and process the results using cat (Granlund & Stallman, 2012), jq

(Dolan, 2014), and json2csv (Czebotar, 2014):

$ cat results/1/*/2/*/stdout |

> jq -c '.response.docs[] | {date: .pub_date, type: .document_type, ' \

> 'title: .headline.main }' | json2csv -p -k date,type,title > fashion.csv

Let's break down this command:

We combine the output of each of the 500 parallel jobs (or API requests).

We use jq to extract the publication date, the document type, and the headline of

each article.

Search WWH ::

Custom Search

Home