Cascading - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Flexibility

Let's take a step back and see what this new model has given us — or better yet, what it

has taken away.

You see, we no longer think in terms of MapReduce jobs, or Mapper and Reducer inter-

face implementations and how to bind or link subsequent MapReduce jobs to the ones that

precede them. During runtime, the Cascading “planner” figures out the optimal way to par-

tition the pipe assembly into MapReduce jobs and manages the linkages between them

( Figure 24-8 ).

Figure 24-8. How a Flow translates to chained MapReduce jobs

Because of this, developers can build applications of arbitrary granularity. They can start

with a small application that just filters a logfile, then iteratively build more features into

the application as needed.

Since Cascading is an API and not a syntax like strings of SQL, it is more flexible. First

off, developers can create domain-specific languages (DSLs) using their favorite lan-

guages, such as Groovy, JRuby, Jython, Scala, and others (see the project site for ex-

amples). Second, developers can extend various parts of Cascading, such as allowing cus-

tom Thrift or JSON objects to be read and written to and allowing them to be passed

through the tuple stream.

Search WWH ::

Custom Search

Home