Introduction - Data Science at the Command Line

Database Reference

In-Depth Information

CHAPTER 1

Introduction

This topic is about doing data science at the command line. Our aim is to make you a

more efficient and productive data scientist by teaching you how to leverage the

power of the command line.

Having both the terms “data science” and “command line” in the title requires an

explanation. How can a technology that's over 40 years old 1 be of any use to a field

that's only a few years young?

Today, data scientists can choose from an overwhelming collection of exciting tech‐

nologies and programming languages. Python, R, Hadoop, Julia, Pig, Hive, and Spark

are but a few examples. You may already have experience in one or more of these. If

so, then why should you still care about the command line for doing data science?

What does the command line have to offer that these other technologies and pro‐

gramming languages do not?

These are all valid questions. This first chapter will answer these questions as follows.

First, we provide a practical definition of data science that will act as the backbone of

this topic. Second, we'll list five important advantages of the command line. Third, we

demonstrate the power and flexibility of the command line through a real-world use

case. By the end of this chapter we hope to have convinced you that the command

line is indeed worth learning for doing data science.

1 The development of the UNIX operating system started back in 1969 . It featured a command line since the

beginning, and the important concept of pipes was added in 1973.

Search WWH ::

Custom Search

Home