The Workflow Abstraction - Enterprise Data Workflows with Cascading

Databases Reference

In-Depth Information

is a kind of “one-two punch” in Cascading, leveraging computer science theory in dif‐

ferent layers.

Books about Separation of Concerns

For more information about literate programming and separation of concerns:

• Literate Programming by Donald Knuth (Stanford)

• Elements Of Functional Programming by Chris Reade (Addison-Wesley)

Functional Relational Programming

Cascalog developers describe the separation of concerns between business process and

implementation (parallelization, etc.) as a principle: “specify what you require, not how

it must be achieved.” That's an important principle because in practice, quite arguably,

developing Enterprise data workflows is an inherently complex matter. The frameworks

for distributed systems such as Hadoop, HBase, Cassandra, Memcached, etc. introduce

lots of complexity into the engineering process. Typical kinds of problems being solved,

often leveraging machine learning algorithms to find a proverbial needle in a haystack

within large data sets, also introduce significant complexity into apps.

The author of Cascalog, Nathan Marz, noted a general problem about Big Data frame‐

works: that the tools being used to solve a given problem can sometimes introduce more

complexity than the problem itself. We call this phenomenon accidental complexity ,

and it represents an important anti-pattern in computer science.

A lot of people talk about how wonderfully expressive Clojure is. However, expressiveness

is not the goal of Clojure. Clojure aims to minimize accidental complexity, and its ex‐

pressiveness is a means to that end.

— Nathan Marz

Twitter (2011)

There are limits to how much complexity people can understand at any given point,

limits to how well we can understand the systems on which we rely. Some approaches

to software design amplify that problem. For example, reading 50,000 lines of COBOL

is not particularly simple. SQL and Java are notorious for encouraging the development

of large, complicated apps. So it makes sense to prevent artifacts in our programming

languages from making Enterprise data workflows even more complex.

Referring back to the original 1969 paper about the relational model , Edgar Codd

focused on the process of structuring data as a mechanism for maintaining data integrity

and consistency of state, while providing a separation of concerns regarding data storage

and representation underneath. This description is quite apt for the workflow abstrac‐

Search WWH ::

Custom Search

Home