Databases Reference
In-Depth Information
• The practice of programming this DAG separates logical aspects from control
aspects.
Moseley and Marks also point toward a management approach indicated by FRP. For
example, an organization could focus one team on minimizing the accidental aspects
of a system. Other teams could then focus on the essential aspects, providing the infra‐
structure and the requirements for interfacing with other systems. Roughly speaking,
that corresponds respectively to the roles of developer, data scientist, ops, etc.; however,
the objectives of those teams become clarified through FRP. It also fits well with what
is shown in Figure 6-3 for cross-team functional integration based on Cascading.
Enterprise vs. Start-Ups
In summary, there are several theoretical aspects of the workflow abstraction. These get
leveraged in Cascading and the DSLs to help minimize the complexity of the engineering
process, and the complexity of understanding systems.
Generally speaking, in terms of Enterprise data workflows, there are two avenues to the
party—scale versus complexity—a contrast that is seen quite starkly in use case analysis
of Cascading deployments.
On one hand, there are Enterprise firms where people must contend with complexity
at scale all day, every day. Incumbents in the Enterprise space make very large invest‐
ments in their back office infrastructure and practices—generally using Java, ANSI SQL,
SAS, etc., and have a large staff trained in those systems and processes. While the in‐
cumbents typically face considerable challenges in trying to be innovative, they are faced
with multiple priorities for migrating workflows onto Apache Hadoop. One priority is
based on economics: scaling out a machine learning app on a Hadoop cluster implies
much less in licensing costs than running the app in SAS. Another priority is risk man‐
agement: being able to scale efficiently and rapidly, when the business requires it. Mean‐
while, a big part of the challenge is to leverage existing staff and integrate infrastructure
without disrupting established processes. The workflow abstraction in Cascading ad‐
dresses those issues directly.
On the other hand, start-ups crave complexity and must scale to become viable. Start-
ups are generally good at innovation and light on existing process. They tend to leverage
sophisticated engineering practices—e.g., Cascalog and Scalding—so that they can have
a relatively lean staff while positioning to compete against the Enterprise incumbents
and disrupt their market share. Cascading provides the foundation for DSLs in func‐
tional programming languages that help power those ventures.
There is a transition curve plotted along the dimensions of scale, complexity, and in‐
novation. One perspective of this is shown in Figure 7-2 .
Search WWH ::




Custom Search