Database Reference
In-Depth Information
There are three main reasons for creating command-line tools in a programming lan‐
guage instead of Bash. First, you may have existing code that you wish to be able to
use from the command line. Second, the command-line tool would end up encom‐
passing more than a hundred lines of code. Third, the command-line tool needs to be
very fast.
The six steps in the previous section roughly apply to creating command-line tools in
other programming languages as well. The first step, however, would not be copying
and pasting from the command line, but rather copying and pasting the relevant code
into a new file. Command-line tools in Python and R need to specify python (Python
Software Foundation, 2014) and Rscript (R Foundation for Statistical Computing,
2014), respectively, as the interpreter after the shebang.
When it comes to creating command-line tools using Python and R, there are two
more aspects that deserve special attention, which will be discussed next. First, pro‐
cessing standard input, which comes natural to shell scripts, has to be taken care of
explicitly in Python and R. Second, as command-line tools written in Python and R
tend to be more complex, we may also want to offer the user the ability to specify
more complex command-line arguments.
Porting the Shell Script
As a starting point, let's see how we would port the prior shell script to both Python
and R. In other words, what Python and R code gives us the most often-used words
from standard input? It is not important whether implementing this task in anything
other than a shell programming language is a good idea. What matters is that it gives
us a good opportunity to compare Bash with Python and R.
We will first show the two files top-words.py and top-words.R and then discuss the dif‐
ferences with the shell code. In Python, the code could would look something like
Example 4-5 .
Example 4-5. ~/book/ch04/top-words.py
#!/usr/bin/env python
import re
import sys
from collections import Counter
num_words = int ( sys . argv [ 1 ])
text = sys . stdin . read () . lower ()
words = re . split ( '\W+' , text )
cnt = Counter ( words )
for word , count in cnt . most_common ( num_words ):
print " %7d %s " % ( count , word )
Search WWH ::




Custom Search