Database Reference
In-Depth Information
However, any jobs running on the cluster will continue running — they will not be killed
by Crunch.
Instead, to stop a pipeline properly, it needs to be launched asynchronously in order to re-
tain a reference to the
PipelineExecution
object:
PipelineExecution execution
=
pipeline
.
runAsync
();
Stopping the pipeline and its jobs is then just a question of calling the
kill()
method on
PipelineExecution
, and waiting for the pipeline to complete:
execution
.
kill
();
execution
.
waitUntilDone
();
At this point, the
PipelineExecution
's status will be
PipelineExecu-
tion.Status.KILLED
, and any previously running jobs on the cluster from this
pipeline will have been killed. An example of where this pattern could be effectively ap-
plied is in a Java VM shutdown hook to safely stop a currently executing pipeline when
the Java application is shut down using Ctrl-C.
NOTE
PipelineExecution
implements
Future<PipelineResult>
, so calling
kill()
can achieve
the same effect as calling
cancel(true)
.
Inspecting a Crunch Plan
Sometimes it is useful, or at least enlightening, to inspect the optimized execution plan.
The following snippet shows how to obtain a DOT file representation of the graph of op-
erations in a pipeline as a string, and write it to a file (using Guava's
Files
utility class).
It relies on having access to the
PipelineExecution
returned from running the
pipeline asynchronously:
PipelineExecution execution
=
pipeline
.
runAsync
();
String dot
=
execution
.
getPlanDotFile
();
Files
.
write
(
dot
,
new
File
(
"pipeline.dot"
),
Charsets
.
UTF_8
);
execution
.
waitUntilDone
();
pipeline
.
done
();
The
dot
command-line tool converts the DOT file into a graphical format, such as PNG,
for easy inspection. The following invocation converts all DOT files in the current direct-
ory to PNG format, so
pipeline.dot
is converted to a file called
pipeline.dot.png
:
%
dot
-
Tpng
-
O
*.
dot