Databases Reference
In-Depth Information
CHAPTER 4
Scalding—A Scala DSL for Cascading
Why Use Scalding?
Cascading represents a pattern language where we use a “plumbing” metaphor with
pipes and operators to build workflows. Looking at sample code in the previous chapter,
the Java source requires much more detail than simply pipes and operators. Even so, we
can use conceptual flow diagrams to keep track of the plumbing—the actual logic of
what is being performed by a workflow. What if we could simply write code at the level
of detail in those diagrams?
Scalding is a domain-specific language (DSL) in the Scala programming language,
which integrates Cascading. The functional programming paradigm used in Scala is
much closer than Java to the original model for MapReduce. Consequently, Scalding
source code for workflows has a nearly 1:1 correspondence with the concise visual de‐
scriptions in our conceptual flow diagrams. In other words, developers work directly
in the plumbing of pipes, where the pattern language becomes immediately visible. That
aspect alone brings incredible advantages for software engineering with very large-scale
data. Apps written in Java with the Cascading API almost seem like assembly language
programming in comparison. Plus, Scala offers other advanced programming models
used in large-scale Enterprise work such as the actor model for concurrency.
While Scalding builds on Cascading, other libraries build atop Scalding—including
support for type-safe libraries, abstract algebra, very large sparse matrices, etc., which
are used to implement distributed algorithms and robust infrastructure for data services.
For example, simple operations such as calculating a running median can become hard
problems when you are servicing hundreds of millions of customers with tight require‐
ments for service-level agreements (SLAs). A running median is an example of a metric
needed in anti-fraud classifiers, social recommenders, customer segmentation, etc.
Scalding offers simple, concise ways to implement distributed algorithms for that kind
 
Search WWH ::




Custom Search