Databases Reference
In-Depth Information
Cascalog is a DSL in
Clojure
that implements first-order predicate logic for large-scale
queries based on Cascading. This work originated at a company called BackType, which
was subsequently acquired by Twitter.
Clojure is a dialect of Lisp intended for functional programming and parallel processing.
The name “Cascalog” is a portmanteau of
CASCading
and
datALOG
. Through the
Leiningen build system, you can also run Cascalog in an interpretive prompt called a
REPL. This represents a powerful combination, because a developer could test snippets
with sample data in a Read-Evaluate-Print Loop (REPL), then compile to a JAR file for
production use on a Hadoop cluster.
Getting Started with Cascalog
The best resources for getting started with Cascalog are the
project wiki
and
API doc‐
umentation
on GitHub.
In addition to Git and Java, which were set up in
Chapter 1
, you will need to have a tool
called Leiningen installed for the examples in this chapter. Make sure that you have Java
Our example shows using
~/bin
as a target directory for the installation of
lein
, but
you could use any available location on your system:
$
export
LEIN_HOME
=
~/bin
$
mkdir -p
$LEIN_HOME
$
cd
$LEIN_HOME
$
wget https://raw.github.com/technomancy/leiningen/preview/bin/lein
$
chmod 755 lein
$
export
PATH
=
$LEIN_HOME
:
$PATH
$
export
JAVA_OPTS
=
-Xmx768m
That downloads the
lein
script, makes it executable, and adds it to your
PATH
environ‐
ment variable. The script will update itself later.
This provides a build system for Clojure, along with an interactive prompt for evaluating
ad hoc queries. Test your installation of
lein
with the following:
$
lein
Leiningen is a tool
for
working with Clojure projects.
There will probably be much more usage text printed out.
Now connect somewhere you have space for downloads, and then use Git to clone the
latest update from the
master
branch of the Cascalog project on GitHub:
$
git clone git://github.com/nathanmarz/cascalog.git
Connect into that newly cloned directory and run the following steps with
lein
to get
Cascalog set up: