Database Reference
In-Depth Information
step may take a very long time. Second, because you want to continue with the same
data. Third, the data may come from an API that has certain rate limits. It would be a
good idea to let one step save the data to a file and then let subsequent steps operate
on that file so that you don't have to make any redundant computations or API calls.
Now, the first reason isn't really a problem in our example because the HTML can be
downloaded fast enough. However, in some cases, the data may come from other
sources or may comprise gigabytes of data.
Every Worklow Starts with a Single Step
In this section, we'll convert the preceding command to a Drake workflow. A work‐
flow is just a text file. You'd usually name this file Drakeile because Drake uses that
file if no other file is specified at the command line. A workflow with just a single step
would look like Example 6-1 .
Example 6-1. A worklow with just a single step (Drakeile)
data/top-5 <-
curl -s 'http://www.gutenberg.org/browse/scores/top' |
grep -E '^<li>' |
head -n 5 |
sed -E "s/.*ebooks\/([0-9]+).*/\\1/" > data/top-5
Let's go through this file. The first line, which contains the arrow pointing to the left,
is our step definition. The left side of this arrow, which says top-5 , is the name or out‐
put of this step. Any inputs to this step would appear on the right side of this arrow,
but because this step has no input, it's empty. Defining inputs and outputs is what
allows Drake to recognize the dependencies between steps, and to figure out whether
and when which steps need to be executed in order to fulfill a certain output. This
output is also known as a target . As you can see, the body of this step is literally our
command from before but then indented.
The arrow ( <- ) denotes the name of the step and its dependencies. More on this
later.
The body is indented.
Select only list items.
Get the first 5 items.
Extract the ID, and save to file top-5 . Note that top-5 was already specified in the
step definition and that 5 has now been used three times. We're going to address
that later.
Search WWH ::




Custom Search