Database Reference
In-Depth Information
Downloading the ebook using curl .
Converting the entire text to lowercase using tr (Meyering, 2012).
Extracting all the words using grep (Meyering, 2012) and putting each word on a
separate line.
Sorting these words in alphabetical order using sort (Haertel & Eggert, 2012).
Removing all the duplicates and counting how often each word appears in the list
using uniq (Stallman & MacKenzie, 2012).
Sorting this list of unique words by their count in descending order using sort .
Keeping only the top 10 lines (i.e., words) using head .
Each command-line tool used in this one-liner offers a man page.
So, in case you would like to know more about, say, grep , you can
run man grep from the command line. The command-line tools
tr , grep , uniq , and sort will be discussed in more detail in the
next chapter.
There is nothing wrong with running this one-liner just once. However, imagine if we
wanted to find the top 10 words of every ebook on Project Gutenberg. Or imagine
that we wanted the top 10 words of a news website on an hourly basis. In those cases,
it would be best to have this one-liner as a separate building block that can be part of
something bigger. We want to add some flexibility to this one-liner in terms of
parameters, so we'll turn it into a shell script.
Since we use Bash as our shell, the script will be written in the programming language
Bash. This allows us to take the one-liner as the starting point, and gradually improve
on it. To turn this one-liner into a reusable command-line tool, we'll walk you
through the following six steps:
1. Copy and paste the one-liner into a file.
2. Add execute permissions.
3. Define a so-called shebang.
4. Remove the fixed input part.
5. Add a parameter.
6. Optionally extend your PATH.
Search WWH ::




Custom Search