Database Reference
In-Depth Information
We convert the JSON data to CSV using json2csv and store it as fashion.csv.
With wc -l (Rubin & MacKenzie, 2012), we find out that this data set contains 4,855
articles (and not 5,000 because we probably retrieved everything from 2009):
$ wc -l fashion.csv
4856 fashion.csv
Let's inspect the first 10 articles to verify that we have succeeded in obtaining the data.
Note that we're applying cols (Janssens, 2014) and cut (Ihnat, MacKenzie, & Meyer‐
ing, 2012) to the date column in order to leave out the time and time zone informa‐
tion in the table:
$ < fashion.csv cols -c date cut -dT -f1 | head | csvlook
|-------------+------------+-----------------------------------------|
| date | type | title |
|-------------+------------+-----------------------------------------|
| 2009-02-15 | multimedia | Michael Kors |
| 2009-02-20 | multimedia | Recap: Fall Fashion Week, New York |
| 2009-09-17 | multimedia | UrbanEye: Backstage at Marc Jacobs |
| 2009-02-16 | multimedia | Bill Cunningham on N.Y. Fashion Week |
| 2009-02-12 | multimedia | Alexander Wang |
| 2009-09-17 | multimedia | Fashion Week Spring 2010 |
| 2009-09-11 | multimedia | Of Color | Diversity Beyond the Runway |
| 2009-09-14 | multimedia | A Designer Reinvents Himself |
| 2009-09-12 | multimedia | On the Street | Catwalk |
|-------------+------------+-----------------------------------------|
That seems to have worked! In order to gain any insight, we'd better visualize the data.
Figure 1-3 contains a line graph created with R (R Foundation for Statistical Comput‐
ing, 2014), Rio (Janssens, 2014), and ggplot2 (Wickham, 2009).
$ < fashion.csv Rio -ge 'g + geom_freqpoly(aes(as.Date(date), color=type), ' \
> 'binwidth=7) + scale_x_date() + labs(x="date", title="Coverage of New York' \
> ' Fashion Week in New York Times")' | display
By looking at the line graph, we can infer that New York Fashion Week happens two
times per year. And now we know when: once in February and once in September.
Let's hope that it's going to be the same this year so that we can prepare ourselves! In
any case, we hope that with this example, we've shown that he New York Times API is
an interesting source of data. More importantly, we hope that we've convinced you
that the command line can be a very powerful approach for doing data science.
In this section, we've peeked at some important concepts and some exciting
command-line tools. Don't worry if some things don't make sense yet. Most of the
concepts will be discussed in Chapter 2 , and in the subsequent chapters we'll go into
more detail for all the command-line tools used in this section.
Search WWH ::




Custom Search