Databases Reference
In-Depth Information
Start order
3
1
2
Execution time
5 sec
2 sec
3 sec
Figure 10.8 Functional programming
means that you can't guarantee the
order in which items will be transformed
or what items will finish first.
This variation in transformation times and the location of the input data adds an addi-
tional burden on the scheduling system. In distributed systems, input data is repli-
cated on multiple nodes in the cluster. To be efficient, you want the longest-running
jobs to start first. To maximize your resources, the tools that place tasks on different
nodes in a cluster must be able to gather processing information from multiple data
sources. Schedulers that can determine how long a transform will take to run on dif-
ferent nodes and how busy each node is will be most efficient. This information is gen-
erally not provided by imperative systems. Even mature systems like HDFS and
MapReduce continue to refine their ability to efficiently transform large datasets.
10.1.3
Comparing imperative and functional programming at scale
Now let's compare the capability of imperative and functional systems to support pro-
cessing large amounts of shared data being accessed by many concurrent CPU s. A
comparison of imperative versus functional pipelines is shown in figure 10.9.
You can see that when you prevent writes during a transformation, you get the ben-
efit of no side effects. This means that you can restart a failed transformation and be
certain that if it didn't finish, the external state of a system wasn't already updated.
With imperative systems, you can't make this guarantee. Any external changes may
need to be undone if there's a failure during a transformation. Keeping track of which
operations have been done can add complexity that will slow large systems down. The
Imperative programming
Data in
Functional programming
Data in
Figure 10.9 Imperative programming (left
panel) and functional programming (right
panel) use different rules when transforming
data. To gain the benefits of referential
transparency, output of a transform must be
completely determined by the inputs to the
transform. No other memory should be read or
written during the transformation process.
Instead of a pipe with holes on the left, you can
visualize your transformation pipes as having
solid steel sides that don't transfer any
information.
Data leakage
Data leakage
Solid
steel
sides
Data leakage
Data leakage
Side
effects
Data out
Data out
Search WWH ::




Custom Search