Database Reference
In-Depth Information
It's worth noting that cut can also split on character positions. This is useful for when
you want to extract (or remove) the same set of characters per input line:
$ grep -i chapter alice.txt | cut -c 9-
I. Down the Rabbit-Hole
II. The Pool of Tears
III. A Caucus-Race and a Long Tale
IV. The Rabbit Sends in a Little Bill
V. Advice from a Caterpillar
VI. Pig and Pepper
VII. A Mad Tea-Party
VIII. The Queen's Croquet-Ground
IX. The Mock Turtle's Story
X. The Lobster Quadrille
XI. Who Stole the Tarts?
XII. Alice's Evidence
grep has a great feature that outputs every match onto a separate line:
$ < alice.txt grep -oE '\w{2,}' | head
Project
Gutenberg
Alice
Adventures
in
Wonderland
by
Lewis
Carroll
This
But what if we wanted to create a data set of all the words that start with an ā€œaā€ and
end with an ā€œeā€. Well, of course there's a pipeline for that, too:
$ < alice.txt tr '[:upper:]' '[:lower:]' | grep -oE '\w{2,}' |
> grep -E '^a.*e$' | sort | uniq -c | sort -nr |
> awk '{print $2","$1}' | header -a word,count | head | csvlook
|-------------+--------|
| word | count |
|-------------+--------|
| alice | 403 |
| are | 73 |
| archive | 13 |
| agree | 11 |
| anyone | 5 |
| alone | 5 |
| age | 4 |
| applicable | 3 |
| anywhere | 3 |
| alive | 3 |
|-------------+--------|
Search WWH ::




Custom Search