Databases Reference
In-Depth Information
:distribution :repo }
:uberjar-name "copa.jar"
:aot [ copa.core ]
:main copa.core
:min-lein-version "2.0.0"
:source-paths [ "src/main/clj" ]
:dependencies [[ org.clojure/clojure "1.4.0" ]
[ cascalog "1.10.1-SNAPSHOT" ]
[ cascalog-more-taps "0.3.1-SNAPSHOT" ]
[ clojure-csv/clojure-csv "2.0.0-alpha2" ]
[ org.clojars.sunng/geohash "1.0.1" ]
[ date-clj "1.0.1" ]]
:exclusions [ org.clojure/clojure ]
:profiles { :dev { :dependencies [[ midje-cascalog "0.4.0" ]]}
:provided { :dependencies
[[ org.apache.hadoop/hadoop-core "0.20.2-dev" ]]
}})
To build this sample app from a command line, run Leiningen:
$ lein clean
$ lein uberjar
That builds a “fat jar” that includes all the libraries for the Cascalog app. Next, we clear
any previous output directory (required by Hadoop), then run the app in standalone
mode:
$ rm -rf out/
$ hadoop jar ./target/copa.jar \
data/copa.csv data/meta_tree.tsv data/meta_road.tsv data/gps.csv \
out/trap out/park out/tree out/road out/shade out/gps out/reco
The recommender results will be in partition files in the out/reco/ directory. A gist on
GitHub shows building and running this app. If your results look similar, you should
be good to go.
Alternatively, if you want to run this app on the Amazon AWS cloud, the steps are the
same as for “Example 3 in Scalding: Word Count with Customized Operations” on page
54 . First you'll need to sign up for the EMR and S3 services, and also have your cre‐
dentials set up in the local configuration—for example, in your ~/.aws_cred/ directory.
Edit the emr.sh Bash script to use one of your S3 buckets, and then run that script from
your command line.
Key Points of the Recommender Workflow
This workflow illustrates some of the key points of building Enterprise data workflows:
1. Typically a workflow starts with some kind of ETL, loading unstructured data—
which we see for the GIS export and GPS log files.
Search WWH ::




Custom Search