Database Reference
In-Depth Information
This workflow is as simple as it gets. It doesn't offer any advantages over having our
command in a Bash script. But don't worry, we promise you that it will get more
exciting. For now, let's run Drake and see what it does with our first workflow:
$ drake
The following steps will be run, in order:
1: data/top-5 <- [missing output]
Confirm? [y/n] y
Running 1 steps with concurrence of 1...
--- 0. Running (missing output): data/top-5 <-
--- 0: data/top-5 <- -> done in 0.35s
Done (1 steps run).
Between steps, you may want to remove the file drake.log , the hid‐
den directory .drake and any output files to force Drake to start
over.
If we do not specify a workflow file, then Drake will use ./Drakeile . Drake first deter‐
mines which steps need to be run. In our case, the one and only step will be run
because it's missing the output. This means that there's no file named data/top-5 .
Drake asks for confirmation before it will execute these steps. We press <Enter> , and
very soon thereafter we see that Drake is done. Drake did not complain about any
errors in our steps. Let's verify that we have the top 5 topics by looking at the output
file data/top-5 :
$ cat data/top-5
1342
76
11
1661
1952
Now we do have the output file. Let's run Drake again:
$ drake
The following steps will be run, in order:
1: data/top-5 <- [no-input step]
Confirm? [y/n] n
Aborted.
As you can see, Drake wants to execute the step again! However, it now mentions a
different reason, namely, that there's no input step ( [no-input-step] ). Its default
behavior is to check whether the input has changed by looking at the timestamp of
the input. However, because we didn't specify any input, Drake doesn't know whether
or not this step should be run again. We can disable this default behavior to check
timestamps as in Example 6-2 .
Search WWH ::




Custom Search