Database Reference
In-Depth Information
CHAPTER 3
Obtaining Data
This chapter deals with the first step of the OSEMN model: obtaining data. After all,
without any data, there is not much data science that we can do. We assume that the
data that is needed to solve the data science problem at hand already exists at some
location in some form. Our goal is to get this data onto your computer (or into your
Data Science Toolbox) in a form that we can work with.
According to the Unix philosophy, text is a universal interface. Almost every
command-line tool takes text as input, produces text as output, or both. This is the
main reason why command-line tools can work so well together. However, as we'll
see, even just text can come in multiple forms.
Data can be obtained in several ways—for example by downloading it from a server,
by querying a database, or by connecting to a web API. Sometimes, the data comes in
a compressed form or in a binary format such as Microsoft Excel. In this chapter, we
discuss several tools that help tackle this from the command line, including: curl
(Stenberg, 2012), in2csv (Groskopf, 2014), sql2csv (Groskopf, 2014), and tar (Bai‐
ley, Eggert, & Poznyakoff, 2014).
Overview
In this chapter, you'll learn how to:
• Download data from the Internet
• Query databases
• Connect to web APIs
 
Search WWH ::




Custom Search