Database Reference
In-Depth Information
goes in one side and comes out the other side. The output of one Unix program can
be piped into the input of another. Arbitrary-length chains of applications can be con-
nected together, with data f lowing out from one and into another.
Unless you explicitly state otherwise, stdin normally refers to data piped in from
another application, whereas stdout is generally output to the terminal screen as text
output. For example, if you type a command such as ls (which produces a list of files
in the current directory) the stdout will be sent to the terminal.
By connecting two applications together with the pipe operator ( | ) we can run very
powerful computations with just a few commands. In the example in Listing 8.1, we
demonstrate piping the output of ls to the word count utility wc to produce a count
of the lines that ls produces. Similarly, we can run a trivial data transformation by
piping the text output of echo to a sed string-substitution command.
Listing 8.1 Examples of a Unix command-line pipe to redirect stdout
# How many files are in the current directory?
# Pipe the result of ls to wc (word count)
> ls | wc -l
10
# Display text to stdout (terminal)
> echo "Here's some test data"
Here's some test data
# Use a pipe to redirect output to another program,
# in this case a text replace with the sed program
> echo "Here's some test data" | sed 's/Here/There/'
There's some test data
Simple, right? The Unix pipeline examples in Listing 8.1 take advantage of built-in
command-line tools. We can run useful data analysis processes with these tools alone.
Remember: Sometimes the best way to deal with a data challenge is to use the sim-
plest solution.
For more complex or custom tasks, we will probably want to use a more expres-
sive way to process data. A great solution is to use a general-purpose language such as
Python. Python's core libraries feature easy-to-use system modules for scripting tasks.
Let's take a look at a slightly more complex example. Let's write our own Python scripts
that can manipulate data provided from standard input. In Listing 8.2, input_ filter.py
takes string input from stdin , filters out any characters that are not spaces or in a lower-
case ASCII format, and outputs the resulting filtered string to stdout . Another Python
script, output_unique.py (Listing 8.3), splits the resulting string and produces only the
unique words in the string.
By piping the output of input_ filter.py to output_unique.py , we can produce a list of
unique terms. Something much like this simple example could be the first step in pro-
ducing a search index for individual phrases or records.
 
 
Search WWH ::




Custom Search