Beyond MapReduce - Enterprise Data Workflows with Cascading

Databases Reference

In-Depth Information

CHAPTER 6

Beyond MapReduce

Applications and Organizations

Overall, the notion of an Enterprise data workflow spans well beyond Hadoop, inte‐

grating many different kinds of frameworks and processes. Consider the architecture

in Figure 6-1 as a strawman that shows where a typical Enterprise data workflow runs.

In the center there is a workflow consuming from some unstructured data—most likely

some kind of machine data, such as log files—plus some other, more structured data

from another framework, such as customer profiles. That workflow runs on an Apache

Hadoop cluster, and possibly on other topologies, such as in-memory data grids

(IMDGs).

Some of the results go directly to a frontend use case, such as getting pushed into

Memcached, which is backing a customer API. Line of business use cases are what drive

most of the need for Big Data apps.

Some of the results also go to the back office. Enterprise organizations almost always

have made substantial investments in data infrastructure for the back office, in the pro‐

cess used to integrate systems and coordinate different departments, and in the people

trained in that process. Workflow results such as data cubes get pushed from the Hadoop

cluster out to an analytics framework. In turn, those data cubes get consumed for re‐

porting needs, data science work, customer support, etc.

Search WWH ::

Custom Search

Home