Databases Reference
In-Depth Information
Cascalog is a DSL in Clojure that implements first-order predicate logic for large-scale
queries based on Cascading. This work originated at a company called BackType, which
was subsequently acquired by Twitter.
Clojure is a dialect of Lisp intended for functional programming and parallel processing.
The name “Cascalog” is a portmanteau of CASCading and datALOG . Through the
Leiningen build system, you can also run Cascalog in an interpretive prompt called a
REPL. This represents a powerful combination, because a developer could test snippets
with sample data in a Read-Evaluate-Print Loop (REPL), then compile to a JAR file for
production use on a Hadoop cluster.
Getting Started with Cascalog
The best resources for getting started with Cascalog are the project wiki and API doc‐
umentation on GitHub.
In addition to Git and Java, which were set up in Chapter 1 , you will need to have a tool
called Leiningen installed for the examples in this chapter. Make sure that you have Java
1.6, and then read the steps given on the wiki page .
Our example shows using ~/bin as a target directory for the installation of lein , but
you could use any available location on your system:
$ export LEIN_HOME = ~/bin
$ mkdir -p $LEIN_HOME
$ cd $LEIN_HOME
$ wget https://raw.github.com/technomancy/leiningen/preview/bin/lein
$ chmod 755 lein
$ export PATH = $LEIN_HOME : $PATH
$ export JAVA_OPTS = -Xmx768m
That downloads the lein script, makes it executable, and adds it to your PATH environ‐
ment variable. The script will update itself later.
This provides a build system for Clojure, along with an interactive prompt for evaluating
ad hoc queries. Test your installation of lein with the following:
$ lein
Leiningen is a tool for working with Clojure projects.
There will probably be much more usage text printed out.
Now connect somewhere you have space for downloads, and then use Git to clone the
latest update from the master branch of the Cascalog project on GitHub:
$ git clone git://github.com/nathanmarz/cascalog.git
Connect into that newly cloned directory and run the following steps with lein to get
Cascalog set up:
Search WWH ::




Custom Search