Database Reference
In-Depth Information
--- 1: data/top-5 <- data/top.html -> done in 0.02s
Done (2 steps run).
Now, let's assume that we want instead of the top 5 ebooks, the top 10 ebooks. We can
set the NUM variable from the command line and run Drake ( Example 6-4 ).
Example 6-4. Drake worklow with NUM=10 (02.drake)
$ NUM = 10 drake -w 02.drake
The following steps will be run, in order:
1: data/top-10 <- data/top.html [missing output]
Confirm? [y/n] y
Running 1 steps with concurrence of 1...
--- 1. Running (missing output): data/top-10 <- data/top.html
--- 1: data/top-10 <- data/top.html -> done in 0.02s
Done (1 steps run).
As you can see, Drake now only needs to execute the second step, because the output
of the first step has already been satisfied. Again, downloading an HTML file is not
such a big deal, but can you imagine the implications if you were dealing with 10 GB
worth of data?
Rebuilding Speciic Targets
The list of the top 100 ebooks on project Gutenberg changes daily. We've seen that if
we run the Drake workflow again, the HTML containing this list is not downloaded
again. Luckily, Drake allows us to run certain steps again so that we can update this
HTML file:
$ drake -w 02.drake '=top.html'
There is a more convenient way than using the output filename to specify which step
you want to execute again. We can add so-called tags to both the input and output of
steps. A tag starts with a % . It's a good idea to choose a short and descriptive tag name
so that you can easily specify this at the command line. Let's add the tag %html to the
first step and %filter to the second step, as in Example 6-5 .
Example 6-5. Drake worklow with tags (03.drake)
NUM:=5
BASE=data/
top.html, %html <- [-timecheck]
curl -s 'http://www.gutenberg.org/browse/scores/top' > $OUTPUT
top-$[NUM], %filter <- top.html
< $INPUT grep -E '^<li>' |
Search WWH ::




Custom Search