Databases Reference
In-Depth Information
Integrating Pattern into Cascading Apps
Let's take a look at how to incorporate Pattern into a Cascading app. This requires only
two additional lines of source code. The following shows a minimal Cascading app that
uses Pattern, starting with the set up for a
Main.java
class:
public
class
Main
{
public
static
void
main
(
String
[]
args
)
{
String
pmmlPath
=
args
[
0
];
String
inputPath
=
args
[
1
];
String
classifyPath
=
args
[
2
];
String
trapPath
=
args
[
3
];
Properties
properties
=
new
Properties
();
AppProps
.
setApplicationJarClass
(
properties
,
Main
.
class
);
HadoopFlowConnector
flowConnector
=
new
HadoopFlowConnector
(
properties
);
Next, we define three Cascading taps for input, output, and trap:
Tap
inputTap
=
new
Hfs
(
new
TextDelimited
(
true
,
"\t"
),
inputPath
);
Tap
classifyTap
=
new
Hfs
(
new
TextDelimited
(
true
,
"\t"
),
classifyPath
);
Tap
trapTap
=
new
Hfs
(
new
TextDelimited
(
true
,
"\t"
),
trapPath
);
Then we use the
PMMLPlanner
in Pattern to parse the predictive model and build a
SubAssembly
. The PMML file is referenced as a command-line argument called
pmmlPath
in the following code:
PMMLPlanner
pmmlPlanner
=
new
PMMLPlanner
()
.
setPMMLInput
(
new
File
(
pmmlPath
)
)
.
retainOnlyActiveIncomingFields
()
.
setDefaultPredictedField
(
new
Fields
(
"predict"
,
Double
.
class
)
);
flowDef
.
addAssemblyPlanner
(
pmmlPlanner
);
Those are the only lines required for Pattern, other than its package import. In Cascalog
or Scalding, this would require even less code.
Finally, we call the flow planner to create a physical plan and then submit the job to
Hadoop:
Flow
classifyFlow
=
flowConnector
.
connect
(
flowDef
);
classifyFlow
.
writeDOT
(
"dot/classify.dot"
);
classifyFlow
.
complete
();