Database Reference
In-Depth Information
directory), (2) can be customized by the user (e.g., it can also show the time or the
current git (Torvalds & Hamano, 2014) branch you're working on), and (3) is irrele‐
vant for the commands themselves.
In the next chapter we'll explain much more about essential command-line concepts.
Now it's time to first explain why you should learn to use the command line for doing
data science.
Why Data Science at the Command Line?
The command line has many great advantages that can really make you a more an
efficient and productive data scientist. Roughly grouping the advantages, the com‐
mand line is: agile, augmenting, scalable, extensible, and ubiquitous. We elaborate on
each advantage below.
The Command Line Is Agile
The first advantage of the command line is that it allows you to be agile. Data science
has a very interactive and exploratory nature, and the environment that you work in
needs to allow for that. The command line achieves this by two means.
First, the command line provides a so-called read-eval-print-loop (REPL). This
means that you type in a command, press <Enter> , and the command is evaluated
immediately. A REPL is often much more convenient for doing data science than the
edit-compile-run-debug cycle associated with scripts, large programs, and, say,
Hadoop jobs. Your commands are executed immediately, may be stopped at will, and
can be changed quickly. This short iteration cycle really allows you to play with your
data.
Second, the command line is very close to the filesystem. Because data is the main
ingredient for doing data science, it is important to be able to easily work with the
files that contain your data set. The command line offers many convenient tools for
this.
The Command Line Is Augmenting
Whatever technology your data science workflow currently includes (whether it's R,
IPython, or Hadoop), you should know that we're not suggesting you abandon that
workflow. Instead, the command line is presented here as an augmenting technology
that amplifies the technologies you're currently employing.
The command line integrates well with other technologies. On the one hand, you can
often employ the command line from your own environment. Python and R, for
instance, allow you to run command-line tools and capture their output. On the
other hand, you can turn your code (e.g., a Python or R function that you have
Search WWH ::




Custom Search