Database Reference
In-Depth Information
already written) into a command-line tool. We will cover this extensively in Chap‐
ter 4 . Moreover, the command line can easily cooperate with various databases and
file types such as Microsoft Excel.
In the end, every technology has its advantages and disadvantages (including the
command line), so it's good to know several and use whichever is most appropriate
for the task at hand. Sometimes that means using R, sometimes the command line,
and sometimes even pen and paper. By the end of this topic, you'll have a solid
understanding of when you could use the command line, and when you're better off
continuing with your favorite programming language or statistical computing
environment.
The Command Line Is Scalable
Working on the command line is very different from using a graphical user interface
(GUI). On the command line you do things by typing, whereas with a GUI, you do
things by pointing and clicking with a mouse.
Everything that you type manually on the command line, can also be automated
through scripts and tools. This makes it very easy to re-run your commands in case
you made a mistake, when the data set changed, or because your colleague wants to
perform the same analysis. Moreover, your commands can be run at specific inter‐
vals, on a remote server, and in parallel on many chunks of data (more on that in
Chapter 8 ).
Because the command line is automatable, it becomes scalable and repeatable. It is
not straightforward to automate pointing and clicking, which makes a GUI a less
suitable environment for doing scalable and repeatable data science.
The Command Line Is Extensible
The command line itself was invented over 40 years ago. Its core functionality has
largely remained unchanged, but the tools , which are the workhorses of the command
line, are being developed on a daily basis.
The command line itself is language agnostic. This allows the command-line tools to
be written in many different programming languages. The open source community is
producing many free and high-quality command-line tools that we can use for data
science.
These command-line tools can work together, which makes the command line very
flexible. You can also create your own tools, allowing you to extend the effective func‐
tionality of the command line.
Search WWH ::




Custom Search