Case Study: City of Palo Alto Open Data - Enterprise Data Workflows with Cascading

Databases Reference

In-Depth Information

:distribution :repo }

:uberjar-name "copa.jar"

:aot [ copa.core ]

:main copa.core

:min-lein-version "2.0.0"

:source-paths [ "src/main/clj" ]

:dependencies [[ org.clojure/clojure "1.4.0" ]

[ cascalog "1.10.1-SNAPSHOT" ]

[ cascalog-more-taps "0.3.1-SNAPSHOT" ]

[ clojure-csv/clojure-csv "2.0.0-alpha2" ]

[ org.clojars.sunng/geohash "1.0.1" ]

[ date-clj "1.0.1" ]]

:exclusions [ org.clojure/clojure ]

:profiles { :dev { :dependencies [[ midje-cascalog "0.4.0" ]]}

:provided { :dependencies

[[ org.apache.hadoop/hadoop-core "0.20.2-dev" ]]

}})

To build this sample app from a command line, run Leiningen:

$ lein clean

$ lein uberjar

That builds a “fat jar” that includes all the libraries for the Cascalog app. Next, we clear

any previous output directory (required by Hadoop), then run the app in standalone

mode:

$ rm -rf out/

$ hadoop jar ./target/copa.jar \

data/copa.csv data/meta_tree.tsv data/meta_road.tsv data/gps.csv \

out/trap out/park out/tree out/road out/shade out/gps out/reco

The recommender results will be in partition files in the out/reco/ directory. A gist on

GitHub shows building and running this app. If your results look similar, you should

be good to go.

Alternatively, if you want to run this app on the Amazon AWS cloud, the steps are the

same as for “Example 3 in Scalding: Word Count with Customized Operations” on page

54 . First you'll need to sign up for the EMR and S3 services, and also have your cre‐

dentials set up in the local configuration—for example, in your ~/.aws_cred/ directory.

Edit the emr.sh Bash script to use one of your S3 buckets, and then run that script from

your command line.

Key Points of the Recommender Workflow

This workflow illustrates some of the key points of building Enterprise data workflows:

1. Typically a workflow starts with some kind of ETL, loading unstructured data—

which we see for the GIS export and GPS log files.

Search WWH ::

Custom Search

Home