Databases Reference
In-Depth Information
$ rm -rf output
$ scald.rb --hdfs-local src/main/scala/Example3.scala \
--doc data/rain.txt --wc output/wc
In the output, the first 10 lines (including the header) should look like this:
$ head output/wc/part-00000
token count
a 8
air 1
an 1
and 2
area 4
as 2
australia 1
back 1
broken 1
A gist on GitHub shows building and running this app. If your run looks terribly dif‐
ferent, something is probably not set up correctly. Ask the developer community for
troubleshooting advice.
A Word or Two about Functional Programming
At the mention of functional programming , Java is not quite the first programming
language that comes to mind. Cascading, however, with its pattern language and plumb‐
ing metaphor, borrows much from the functional programming paradigm. For example,
there is no concept of “mutable variables” in the context of a flow—just the stream of
data tuples.
Scalding integrates Cascading within Scala, which includes many functional program‐
ming features. The name “Scalding” is a portmanteau of SCALa and cascaDING . For‐
mally, Scalding is a DSL embedded in Scala that binds to Cascading. A DSL is a language
dedicated to a particular kind of problem and solution. The Scala language was designed
in part to support a wide variety of DSLs. The domain for Scalding is about how to
express robust, large-scale data workflows that run on parallel processing frameworks,
typically for machine learning use cases.
Avi Bryant, author of Scalding, introduced his talk at the Strata 2012 conference with a
special recipe:
Start on low heat with a base of Hadoop; map, then reduce. Flavor, to taste, with Scala's
concise, functional syntax and collections library. Simmer with some Pig bones: a tuple
model and high-level join and aggregation operators. Mix in Cascading to hold every‐
thing together and boil until it's very, very hot, and you get Scalding, a new API for
MapReduce out of Twitter.
— Avi Bryant
Scala + Cascading = Scalding (2012)
Search WWH ::




Custom Search