Databases Reference
In-Depth Information
Figure 6-3. Strawman—functional integration
To support this, Cascading includes two components, Lingual for ANSI SQL and Pattern
for PMML, which are effectively DSLs. These allow for all of the following:
• ETL, which has been run in ANSI SQL, can be used directly in Lingual flows.
• Data preparation can be handled in Cascading, Cascalog, Scalding, etc.
• Predictive models can be exported as PMML and used directly in Pattern flows, for
scoring at scale.
• Cascading taps integrate other frameworks for the data sources and sinks.
• All of this goes into one app, one JAR, which Ops can schedule and instrument with
much less complexity.
• Some optimizations may become possible for the flow planners and compiler as a
result of this integration.
In other words, the different departments have a way to collaborate on a combined app
that ties together and optimizes business processes across the organization.
Lingual, a DSL for ANSI SQL
Lingual is an extension to Cascading that executes ANSI SQL queries as Cascading apps.
This open source project is a collaboration between Cascading and Optiq —an ANSI-
compliant SQL parser/optimizer written by Julian Hyde, the author of Mondrian . Julian
wrote a good description of the project .
It is important to note that Lingual itself is not a database. Rather, it leverages the power
of SQL to describe the business logic for data workflows—as a kind of functional
programming. In that sense Lingual implements a domain-specific language (DSL)
where Cascading workflows get defined in SQL. Optiq provides compatibility with a
 
Search WWH ::




Custom Search