Databases Reference
In-Depth Information
CHAPTER 7
The Workflow Abstraction
Key Insights
Thus far, we have looked at several examples of how to use Cascading. Now let's step
back a bit and take a look at some of the theory at its foundation.
The author of Cascading, Chris Wensel, was working at a large firm known well for
many data products. Wensel was evaluating the Nutch project, which included Lucene
and subsequently Hadoop—he was evaluating how to leverage these open source tech‐
nologies for Big Data within an Enterprise environment. His takeaway was that it would
be difficult to find enough Java developers who could write complex Enterprise apps
directly in MapReduce.
An obvious response would have been to build some kind of abstraction layer atop
Hadoop. Many different variations of this have been developed over the years, and that
approach dates back to the many “fourth-generation languages” (4GL) starting in the
1970s. However, another takeaway Wensel had from the early days of Apache Hadoop
use was that abstraction layers built by and for the early adopters typically would not
pass the “bench test” for Enterprise. The operational complexity of large-scale apps and
the need to leverage many existing software engineering practices would be difficult if
not impossible to manage through a 4GL-styled abstraction layer.
A key insight into this problem was that MapReduce is based on the functional pro‐
gramming paradigm. In the original MapReduce paper by Jeffrey Dean and Sanjay
Ghemawat at Google, the authors made clear that a functional programming model
allowed for the following:
 
Search WWH ::




Custom Search