Database Reference
In-Depth Information
Flexibility
Let's take a step back and see what this new model has given us — or better yet, what it
has taken away.
You see, we no longer think in terms of MapReduce jobs, or Mapper and Reducer inter-
face implementations and how to bind or link subsequent MapReduce jobs to the ones that
precede them. During runtime, the Cascading “planner” figures out the optimal way to par-
tition the pipe assembly into MapReduce jobs and manages the linkages between them
( Figure 24-8 ).
Figure 24-8. How a Flow translates to chained MapReduce jobs
Because of this, developers can build applications of arbitrary granularity. They can start
with a small application that just filters a logfile, then iteratively build more features into
the application as needed.
Since Cascading is an API and not a syntax like strings of SQL, it is more flexible. First
off, developers can create domain-specific languages (DSLs) using their favorite lan-
guages, such as Groovy, JRuby, Jython, Scala, and others (see the project site for ex-
amples). Second, developers can extend various parts of Cascading, such as allowing cus-
tom Thrift or JSON objects to be read and written to and allowing them to be passed
through the tuple stream.
Search WWH ::




Custom Search