Databases Reference
In-Depth Information
is a kind of “one-two punch” in Cascading, leveraging computer science theory in dif‐
ferent layers.
Books about Separation of Concerns
For more information about literate programming and separation of concerns:
Literate Programming by Donald Knuth (Stanford)
Elements Of Functional Programming by Chris Reade (Addison-Wesley)
Functional Relational Programming
Cascalog developers describe the separation of concerns between business process and
implementation (parallelization, etc.) as a principle: “specify what you require, not how
it must be achieved.” That's an important principle because in practice, quite arguably,
developing Enterprise data workflows is an inherently complex matter. The frameworks
for distributed systems such as Hadoop, HBase, Cassandra, Memcached, etc. introduce
lots of complexity into the engineering process. Typical kinds of problems being solved,
often leveraging machine learning algorithms to find a proverbial needle in a haystack
within large data sets, also introduce significant complexity into apps.
The author of Cascalog, Nathan Marz, noted a general problem about Big Data frame‐
works: that the tools being used to solve a given problem can sometimes introduce more
complexity than the problem itself. We call this phenomenon accidental complexity ,
and it represents an important anti-pattern in computer science.
A lot of people talk about how wonderfully expressive Clojure is. However, expressiveness
is not the goal of Clojure. Clojure aims to minimize accidental complexity, and its ex‐
pressiveness is a means to that end.
— Nathan Marz
Twitter (2011)
There are limits to how much complexity people can understand at any given point,
limits to how well we can understand the systems on which we rely. Some approaches
to software design amplify that problem. For example, reading 50,000 lines of COBOL
is not particularly simple. SQL and Java are notorious for encouraging the development
of large, complicated apps. So it makes sense to prevent artifacts in our programming
languages from making Enterprise data workflows even more complex.
Referring back to the original 1969 paper about the relational model , Edgar Codd
focused on the process of structuring data as a mechanism for maintaining data integrity
and consistency of state, while providing a separation of concerns regarding data storage
and representation underneath. This description is quite apt for the workflow abstrac‐
 
Search WWH ::




Custom Search