Database Reference
In-Depth Information
$
curl j.mp/locatbbar
<html>
<head>
<title>bit.ly</title>
</head>
<body>
<a href="http://en.wikipedia.org/wiki/List_of_countries_and_territories_by_bo
rder/area_ratio">moved here</a>
</body>
By specifying the
-I
or
--head
option,
curl
fetches only the HTTP header of the
response:
$
curl -I j.mp/locatbbar
HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Wed, 21 May 2014 18:50:28 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Cache-Control: private; max-age=90
Content-Length: 175
Location: http://en.wikipedia.org/wiki/List_of_countries_and_territories_by_bo
Mime-Version: 1.0
Set-Cookie: _bit=537cf574-002ba-07d79-2e1cf10a;domain=.j.mp;expires=Mon Nov 17
The first line indicates the HTTP status code, which is 301 (moved permanently) in
this case. You can also see the location this URL redirects to:
http://en.wikipedia.org/
wiki/List_of_countries_and_territories_by_border/area_ratio
.
Inspecting the header
and getting the status code is a useful debugging tool in case
curl
does not give you
the expected result. Other common HTTP status codes include 404 (not found) and
403 (forbidden). (See
Wikipedia
for a list of all HTTP status codes.)
To conclude this section, cURL is a straightforward command-line tool for down‐
loading data from the Internet. Its three most common options are
-s
to suppress the
progress meter,
-u
to specify a username and password, and
-L
to automatically fol‐
low redirects. See its man page for more information.
Calling Web APIs
In the previous section we explained how to download individual files from the Inter‐
net. Another way data can come from the Internet is through a web API, which
stands for
application programming interface
. The number of APIs that are being
offered by organizations is growing at an ever increasing rate, which means a lot of
interesting data is available for us data scientists!
Web APIs are not meant to be presented in a nice layout, such as websites. Instead,
most web APIs return data in a structured format, such as JSON or XML. Having
data in a structured form has the advantage that the data can be easily processed by