Distributed Data Processing with Cascalog - Clojure Data Analysis

Database Reference

In-Depth Information

3. Once this is in place, we create a JAR ile containing this ile and all of its dependencies:

$ lein uberjar

Created /Users/err8n/p/cljbook/distrib-data/target/distrib-data-

0.1.0.jar

Created /Users/err8n/p/cljbook/distrib-data/target/distrib-data-

0.1.0-standalone.jarjar

If you're using Windows, Mac, or another OS with a case-insensitive

filesystem, you'll need to remove the LICENSE file, because it will clash

with a license directory. To do this, you can use this command:

$ zip -d target/distrib-data-0.1.0-standalone.jar \

META-INF/LICENSE

deleting: META-INF/LICENSE

4.

Now, we can start the Clojure REPL from within the Hadoop-controlled grid of

computers, using the hadoop command on the JAR ile we just created:

$ hadoop jar target/distrib-data-0.1.0-standalone.jar

Clojure 1.66.0

user=>

5.

Inside the REPL that just started, we need to import the libraries that we're going

to use:

user=> (require '[clojure.string :as string]

'[cascalog.logic.ops :as c])

nil

user=> (use 'cascalog.api)

nil

6.

Finally, once this is in place, we can execute the Cascalog query to read the

companions.txt ile:

user=> (?<- (stdout) [?line]

((hfs-textline "hdfs:///user/err8n/flights_with_colnames.csv")

:> ?line))

…

RESULTS

origin_airport,destin_airport,passengers,flights,month

MHK,AMW,21,1,200810

EUG,RDM,41,22,199011

EUG,RDM,88,19,199012

EUG,RDM,11,4,199010

…

Search WWH ::

Custom Search

Home