Databases Reference
In-Depth Information
Integrating Pattern into Cascading Apps
Let's take a look at how to incorporate Pattern into a Cascading app. This requires only
two additional lines of source code. The following shows a minimal Cascading app that
uses Pattern, starting with the set up for a Main.java class:
public class Main {
public static void main ( String [] args ) {
String pmmlPath = args [ 0 ];
String inputPath = args [ 1 ];
String classifyPath = args [ 2 ];
String trapPath = args [ 3 ];
Properties properties = new Properties ();
AppProps . setApplicationJarClass ( properties , Main . class );
HadoopFlowConnector flowConnector =
new HadoopFlowConnector ( properties );
Next, we define three Cascading taps for input, output, and trap:
Tap inputTap =
new Hfs ( new TextDelimited ( true , "\t" ), inputPath );
Tap classifyTap =
new Hfs ( new TextDelimited ( true , "\t" ), classifyPath );
Tap trapTap =
new Hfs ( new TextDelimited ( true , "\t" ), trapPath );
Then we use the PMMLPlanner in Pattern to parse the predictive model and build a
SubAssembly . The PMML file is referenced as a command-line argument called
pmmlPath in the following code:
PMMLPlanner pmmlPlanner = new PMMLPlanner ()
. setPMMLInput ( new File ( pmmlPath ) )
. retainOnlyActiveIncomingFields ()
. setDefaultPredictedField ( new Fields ( "predict" , Double . class ) );
flowDef . addAssemblyPlanner ( pmmlPlanner );
Those are the only lines required for Pattern, other than its package import. In Cascalog
or Scalding, this would require even less code.
Finally, we call the flow planner to create a physical plan and then submit the job to
Hadoop:
Flow classifyFlow = flowConnector . connect ( flowDef );
classifyFlow . writeDOT ( "dot/classify.dot" );
classifyFlow . complete ();
Search WWH ::




Custom Search