Databases Reference
In-Depth Information
cascalog-user email forum or tweet to #Cascalog on Twitter. Very helpful developers
are available to assist.
Example 1 in Cascalog: Simplest Possible App
The tutorial examples show Cascalog code snippets run in the interactive REPL. Next
let's use lein to build a “fat jar” that can also run on an Apache Hadoop cluster.
Paul Lam of uSwitch has translated each of the “Impatient” series of Cascading apps
into Cascalog, some of which are more expressive than the originals .
Connect somewhere you have space for downloads, and then use Git to clone the Cas‐
calog version of “Impatient” on GitHub:
$ git clone git://github.com/Quantisan/Impatient.git
Connect into the part8 subdirectory. Then we'll review an app in Cascalog for a dis‐
tributed file copy, similar to “Example 1: Simplest Possible App in Cascading” on page
3 :
cd Impatient/part1
Source is located in the src/impatient/core.clj file:
( ns impatient.core
( :use [ cascalog.api ]
[ cascalog.more-taps :only ( hfs-delimited )])
( :gen-class ))
( defn -main [ in out & args ]
( ?<- ( hfs-delimited out )
[ ?doc ?line ]
(( hfs-delimited in :skip-header? true ) ?doc ?line )))
The first four lines, which begin with a macro ns , define a namespace . Java and Scala
use packages and imports for similar reasons, but in general Clojure namespaces are
more advanced. For example, they provide better features for avoiding naming colli‐
sions. Namespaces are also first-class constructs in Clojure, so they can be composed
dynamically. In this example, the namespace imports the Cascalog API, plus additional
definitions for Cascading taps—such as TextDelimited for TSV format.
The next four lines, which begin with a macro defn , define a function , which is analo‐
gous to the Main method in “Example 1: Simplest Possible App in Cascading” . It has
arguments for the in source tap identifier and the out sink tap identifier, plus an args
argument list for arity overloading. A query writes output in TSV format for each tuple
of ?doc and ?line fields from the input tuple stream. Note the property :skip-
header? set to true , which causes the source tap to skip headers in the input TSV data.
Search WWH ::




Custom Search