Database Reference
In-Depth Information
may be better off using a programming language. As you gain more experience on
the command-line, you will start to recognize when to use which approach. When
everything is a command-line tool, you can even split up the task into subtasks, and
combine a Bash command-line tool with, say, a Python command-line tool.
Whichever approach works best for the task at hand!
Processing Streaming Data from Standard Input
In the previous two code examples, both Python and R read the complete standard
input at once. On the command line, most command-line tools pipe data to the next
command-line tool in a streaming fashion. (There are a few command-line tools that
require the complete data before they write any data to standard output, like
sort
and
awk
(Brennan, 1994).) This means the pipeline is blocked by such command-line
tools. This does not have to be a problem when the input data is finite, like a file.
However, when the input data is a nonstop stream, such blocking command-line
tools are useless.
Luckily, Python and R can both process data in a streaming matter. You can apply a
function on a line-per-line basis, for example. Examples
4-7
and
4-8
are two minimal
examples that demonstrate how this works in Python and R, respectively. They com‐
pute the square of every integer that is piped to them.
Example 4-7. ~/book/ch04/stream.py
#!/usr/bin/env python
from
sys
import
stdin
,
stdout
while
True
:
line
=
stdin
.
readline
()
if
not
line
:
break
stdout
.
write
(
"
%d
\n
"
%
int
(
line
)
**
2
)
stdout
.
flush
()
Example 4-8. ~/book/ch04/stream.R
#!/usr/bin/env Rscript
f
<-
file
(
"stdin"
)
open
(
f
)
while
(
length
(
line
<-
readLines
(
f
,
n
=
1
))
>
0
)
{
write
(
as.integer
(
line
)
^
2
,
stdout
())
}
close
(
f
)