Getting Started - Data Science at the Command Line

Database Reference

In-Depth Information

CHAPTER 2

Getting Started

In this chapter, we are going to make sure that you have all the prerequisites for doing

data science at the command line. The prerequisites fall into two parts: (1) having a

proper environment with all the command-line tools that we employ in this topic,

and (2) understanding the essential concepts that come into play when using the

command line.

First, we describe how to install the Data Science Toolbox, which is a virtual environ‐

ment based on GNU/Linux that contains all the necessary command-line tools. Sub‐

sequently, we explain the essential command-line concepts through examples.

By the end of this chapter, you'll have everything you need in order to continue with

the first step of doing data science, namely obtaining data.

Overview

In this chapter, you'll learn:

• How to set up the Data Science Toolbox

• Essential concepts and tools necessary to do data science at the command line

Setting Up Your Data Science Toolbox

In this topic we use many different command-line tools. The distribution of GNU/

Linux that we are using, Ubuntu, comes with a whole bunch of command-line tools

pre-installed. Moreover, Ubuntu offers many packages that contain other, relevant

command-line tools. Installing these packages yourself is not too difficult. However,

we also use command-line tools that are not available as packages and require a more

manual, and more involved, installation. In order to acquire the necessary command-

Search WWH ::

Custom Search

Home