Database Reference
In-Depth Information
CHAPTER 1
Introduction
This topic is about doing data science at the command line. Our aim is to make you a
more efficient and productive data scientist by teaching you how to leverage the
power of the command line.
Having both the terms “data science” and “command line” in the title requires an
explanation. How can a technology that's over 40 years old 1 be of any use to a field
that's only a few years young?
Today, data scientists can choose from an overwhelming collection of exciting tech‐
nologies and programming languages. Python, R, Hadoop, Julia, Pig, Hive, and Spark
are but a few examples. You may already have experience in one or more of these. If
so, then why should you still care about the command line for doing data science?
What does the command line have to offer that these other technologies and pro‐
gramming languages do not?
These are all valid questions. This first chapter will answer these questions as follows.
First, we provide a practical definition of data science that will act as the backbone of
this topic. Second, we'll list five important advantages of the command line. Third, we
demonstrate the power and flexibility of the command line through a real-world use
case. By the end of this chapter we hope to have convinced you that the command
line is indeed worth learning for doing data science.
1 The development of the UNIX operating system started back in 1969 . It featured a command line since the
beginning, and the important concept of pipes was added in 1973.
 
Search WWH ::




Custom Search