Databases Reference
In-Depth Information
CHAPTER 5
Cascalog—A Clojure DSL for Cascading
Why Use Cascalog?
Sometimes the tools we select change the way we approach a problem. As the proverb
goes, if all you have is a hammer, everything looks like a nail. And sometimes our tools,
over time, actually interfere with the process of solving a problem.
For most of the past three decades, SQL has been synonymous with database work. A
couple of generations of programmers have grown up with relational databases as the
de facto standard. Consider that while “NoSQL” has become quite a popular theme,
most vendors in the Big Data space have been rushing (circa 2013Q1) to graft SQL
features onto their frameworks.
Looking back four decades to the origins of the relational model—in the 1970 paper by
Edgar Codd, “A Relational Model of Data for Large Shared Data Banks”—the point was
about relational models and not so much about databases and tables and structured
queries. Codd himself detested SQL. The relational model was formally specified as a
declarative “data sublanguage” (i.e., to be used within some other host language) based
on first-order predicate logic . SQL is not that. In comparison, it forces programmers to
focus largely on control flow issues and the structure of tables—to a much greater extent
than the relational model intended. SQL's semantics are also disjoint from the pro‐
gramming languages in which it gets used: Java, C++, Ruby, PHP, etc. For that matter,
the term “relational” no longer even appears in the SQL-92 specifications.
Codd's intent, effectively, was to avoid introducing unnecessary complexities that would
hamper software systems. He articulated a process for structuring data as relations of
tuples, as opposed to using structured data that is managed in tables. He also intended
queries to be expressed within what we would now call a DSL. Those are subtle points
that have enormous implications, which we'll explore in Chapter 7 .
 
Search WWH ::




Custom Search