Database Reference
In-Depth Information
Also, we'll need to import a number of namespaces from these libraries into our script
or REPL:
(require '[cascalog.logic.ops :as c]
'[cascalog.cascading.tap :as tap]
'[cascalog.cascading.util :as u])
(use 'cascalog.api)
(import [cascading.tuple Fields]
[cascading.scheme.hadoop.TextDelimited])
We'll also use the data ile that we did in the Distributing data with Apache HDFS recipe.
You can access it either locally or through HDFS, as we did earlier. I'll access it locally for
this recipe.
How to do it…
1.
We just need to write a function that creates a cascading.scheme.hadoop.
TextDelimited tap scheme with the correct options and then calls the cascalog.
tap/hfs-tap Cascalog function with it. That will handle the rest, as shown here:
(defn hfs-text-delim
[path & {:keys [fields has-header delim quote-str]
:as opts
:or {fields Fields/ALL, has-header false, delim ",",
quote-str "\""}}]
(let [scheme (TextDelimited. (w/fields fields) has-header delim
quote-str)
tap-opts (select-keys opts [:scascalog :sinkmode
:sinkparts
:source-pattern
:sink-template
:templatefields])]
(apply tap/hfs-tap scheme path tap-opts)))
2.
Now, let's try this out:
user=> (?<- (stdout)
[?origin_airport ?destin_airport]
((hfs-text-delim "data/16285/flights_with_colnames.csv"
:has-header true)
?origin_airport ?destin_airport ?passengers ?flights
?month))
RESULTS
-----------------------
MHK AMW
EUG RDM
EUG RDM
EUG RDM
 
Search WWH ::




Custom Search