Database Reference
In-Depth Information
However, any jobs running on the cluster will continue running — they will not be killed
by Crunch.
Instead, to stop a pipeline properly, it needs to be launched asynchronously in order to re-
tain a reference to the PipelineExecution object:
PipelineExecution execution = pipeline . runAsync ();
Stopping the pipeline and its jobs is then just a question of calling the kill() method on
PipelineExecution , and waiting for the pipeline to complete:
execution . kill ();
execution . waitUntilDone ();
At this point, the PipelineExecution 's status will be PipelineExecu-
tion.Status.KILLED , and any previously running jobs on the cluster from this
pipeline will have been killed. An example of where this pattern could be effectively ap-
plied is in a Java VM shutdown hook to safely stop a currently executing pipeline when
the Java application is shut down using Ctrl-C.
NOTE
PipelineExecution implements Future<PipelineResult> , so calling kill() can achieve
the same effect as calling cancel(true) .
Inspecting a Crunch Plan
Sometimes it is useful, or at least enlightening, to inspect the optimized execution plan.
The following snippet shows how to obtain a DOT file representation of the graph of op-
erations in a pipeline as a string, and write it to a file (using Guava's Files utility class).
It relies on having access to the PipelineExecution returned from running the
pipeline asynchronously:
PipelineExecution execution = pipeline . runAsync ();
String dot = execution . getPlanDotFile ();
Files . write ( dot , new File ( "pipeline.dot" ), Charsets . UTF_8 );
execution . waitUntilDone ();
pipeline . done ();
The dot command-line tool converts the DOT file into a graphical format, such as PNG,
for easy inspection. The following invocation converts all DOT files in the current direct-
ory to PNG format, so pipeline.dot is converted to a file called pipeline.dot.png :
% dot - Tpng - O *. dot
Search WWH ::




Custom Search