Databases Reference
In-Depth Information
data/iris.rf.tsv out/classify out/trap \
--pmml data/iris.rf.xml \
--assert \
--measure out/measure
First, we clear the out directory used for output files, because Hadoop will check for it
and fail the app rather than overwrite data. We specify the input data source data/
iris.rf.tsv , output data sink out/classify/* , and also out/trap as a trap sink. The
latter is used for catching bad input data. The --pmml data/iris.rf.xml command-
line argument specifies our PMML model.
Note that we add --assert and --measure as optional command-line arguments. For
each tuple in the data, a stream assertion tests whether the predicted field matches the
score field generated by the model in R. Tuples that fail that assertion get trapped into
out/trap/part* for inspection later. Also, a confusion matrix gets written to out/meas
ure/part* output, based on species as the predicted field. We measure the perfor‐
mance of the predictive model, counting how many false positives or false negatives
result.
The output shows that model had a 100% success rate for the regression test. If there
had been any difference between the Pattern results and the R results, Cascading stream
assertions would have rejected those output tuples and shown exceptions in the console
log:
$ head out/classify/part-00000
sepal_length sepal_width petal_length petal_width species predict score
5.1 3.5 1.4 0.2 setosa setosa setosa
4.9 3 1.4 0.2 setosa setosa setosa
4.7 3.2 1.3 0.2 setosa setosa setosa
4.6 3.1 1.5 0.2 setosa setosa setosa
5 3.6 1.4 0.2 setosa setosa setosa
5.4 3.9 1.7 0.4 setosa setosa setosa
4.6 3.4 1.4 0.3 setosa setosa setosa
5 3.4 1.5 0.2 setosa setosa setosa
4.4 2.9 1.4 0.2 setosa setosa setosa
$
$ head out/measure/part-00000
species score count
setosa setosa 50
versicolor versicolor 48
versicolor virginica 2
virginica versicolor 1
virginica virginica 49
As expected, there is approximately 5% error overall. The setosa species gets predicted
correctly, whereas the other two species have some overlap.
Search WWH ::




Custom Search