Database Reference
In-Depth Information
sample has two other purposes, which can be useful when you're in debugging mode.
First, it's possible to add some delay to the output. This comes in handy when the
input is a constant stream (e.g., the Twitter firehose), and the data comes in too fast to
see what's going on. Secondly, you can put a timer on sample . This way, you don't
have to kill the ongoing process manually. To add a 1-second delay between each out‐
put line to the previous command and to only run for 5 seconds:
$ seq 10000 | sample -r 1% -d 1000 -s 5 | jq -c '{line: .}'
In order to prevent unnecessary computation, try to put sample as early as possible in
your pipeline (this advice holds for any command-line tool that reduces data, like
head and tail ). Once you're done debugging, you can simply take it out of the
pipeline.
Extracting Values
To extract the actual chapter headings from our example earlier, we can take a simple
approach by piping the output of grep to cut :
$ grep -i chapter alice.txt | cut -d ' ' -f3-
Down the Rabbit-Hole
The Pool of Tears
A Caucus-Race and a Long Tale
The Rabbit Sends in a Little Bill
Advice from a Caterpillar
Pig and Pepper
A Mad Tea-Party
The Queen's Croquet-Ground
The Mock Turtle's Story
The Lobster Quadrille
Who Stole the Tarts?
Alice's Evidence
Here, each line that's passed to cut is split on spaces into fields, and then the third
field to the last field is being printed. The total number of fields can be different per
input line. With sed , we can accomplish the same task in a much more complex
manner:
$ sed -rn 's/^CHAPTER ([IVXLCDM]{1,})\. (.*)$/\2/p' alice.txt > /dev/null
(Because the output is the same it's omitted by redirecting it to /dev/null .) This
approach uses a regular expression and a back reference. Here, sed also takes over the
work done by grep . This complex approach is only advisable when a simpler one
would not work. For example, if “chapter” was ever part of the text itself and not just
used to indicate the start of a new chapter. Of course there are many levels of com‐
plexity which would have worked around this, but this was to illustrate an extremely
strict approach. In practice, the challenge is to find a good balance between complex‐
ity and flexibility.
Search WWH ::




Custom Search