Database Reference
In-Depth Information
Getting ready
We'll use the same dependencies and inclusions that we used in the
Initializing Cascalog and
Hadoop for distributed processing
recipe. We'll also use the
Doctor Who
companion data from
that recipe.
How to do it…
1. We'll deine a new, custom operation to take a date range string and split it into two
values. In this dataset, we're splitting them on an en-dash (
#"\u2013"
). If the input
isn't a range (that is, it's just a year), then the year is returned for both the start and
end of the range:
(defmapfn split-range [date-range]
(let [[from to] (string/split (str date-range) #"\u2013" 2)]
[from (if (nil? to) from (str (.substring from 0 2) to))]))
2.
Then we can use this to transform the tenure dates in the actors' data:
user=> (?<- (stdout)
[?n ?name ?from ?to]
(actor ?n ?name ?range)
(split-range ?range :> ?from ?to))
…
RESULTS
-----------------------
1 William Hartnell 1963 1966
2 Patrick Troughton 1966 1969
3 Jon Pertwee 1970 1974
…
How it works…
In the
split-range
operator, we return a vector containing two years for the output. Then, in
the query, we use the
:>
operator to bind the output values from
split-range
to the names
?from
and
?to
. Destructuring makes this especially easy.