Database Reference
In-Depth Information
Tap outputTap = new Hfs(new TextDelimited(true,","), output_dir);
Pipe outputPipe = new Pipe("output pipe", joinPipe);
// The Flow definition hooks it all together
FlowDef flowDef = FlowDef.flowDef()
.addSource(salesPipe, websalesTap)
.addSource(usersPipe, usersTap)
.addTailSink(outputPipe, outputTap);
flowConnector.connect(flowDef).complete();
}
}
Deploying a Cascading Application on a Hadoop Cluster
Once your application is working as expected on small amounts of local test data, you
can deploy the same packaged JAR file to a Hadoop cluster with no code modifica-
tions. Remember to add the Cascading JAR files to your application's lib directory
and to make sure your source data is available in HDFS. Then move your application
to a node on your Hadoop cluster, and launch it using the hadoop jar command.
Listing 9.6 shows an example of running a Cascading application on a Hadoop cluster.
Listing 9.6 Running your Cascading application on a Hadoop cluster
# Make sure your source data is available in HDFS
$> hadoop dfs -put websales.csv /user/hduser/websales.csv
$> hadoop dfs -put user_info.csv /user/hduser/ user_info.csv
# Run the hadoop jar command
$> hadoop jar mycascading.jar /user/hduser/websales.csv \
/user/hduser/users_info.csv output_directory
INFO util.HadoopUtil: resolving application jar from found
main method on: CascadingSimpleJoinPipe
INFO planner.HadoopPlanner: using application jar:
/home/hduser/ mycascading.jar
INFO property.AppProps: using app.id: 35FEB5D0590D62AFA6D496F3F17C14B9
INFO mapred.FileInputFormat: Total input paths to process : 1
# etc...
If you are relatively new to Hadoop, and Cascading is your introduction to writing
custom JAR files for the framework, take a moment to appreciate what is happen-
ing behind the scenes of the hadoop jar command. A Hadoop cluster comprises
a collection of services that have specialized roles. Services known as JobTrackers
are responsible for keeping track of and sending individual tasks to services on other
machines. TaskTrackers are the cluster's workers; these services accept jobs from the
 
 
Search WWH ::




Custom Search