Beyond MapReduce - Enterprise Data Workflows with Cascading

Databases Reference

In-Depth Information

$ rm -rf out

$ hadoop jar build/libs/pattern-examples-*.jar \

data/sample.tsv out/classify.lr out/trap \

--pmml sample.lr.xml --measure out/measure

$ mv out/classify.lr .

It would be reasonably simple to build a Cascading app to do the comparisons between

models, i.e., a framework for customer experiments. That would be especially useful if

there were a large number of models to compare. In this case, we can compare results

using a spreadsheet as shown in Figure 6-10 .

Figure 6-10. Customer experiment

The model based on Logistic Regression has a lower rate (5% versus 11%) for false

negatives (FN). However, that model has a much higher rate (52% versus 14%) for false

positives (FP).

Let's put this into terms that decision makers use in business to determine which model

is better. For example, in the case of an anti-fraud classifier used in ecommerce, we can

assign a cost function to select a winner of the experiment. On one hand, a higher rate

of false negatives implies that more fraudulent orders fail to get flagged for review.

Ultimately that results in a higher rate of chargeback fines from the bank, and punitive

actions by the credit card processor if that rate goes too high for too long. So the FN

rate is proportional to chargeback risk in ecommerce. On the other hand, a higher rate

of false positives implies that more legitimate orders get flagged for review. Ultimately

that results in more complaints from actual customers, and higher costs for customer

support. So the FP rate is proportional to support costs in ecommerce.

Evaluating this experiment, the Logistic Regression model—which had a variable omit‐

ted to exaggerate the comparison—resulted in approximately half the FN rate, compared

with the Random Forest model. However, it also resulted in quadrupled costs for cus‐

tomer support. A decision maker can use those cost trade-offs to select the appropriate

model for the business needs.

Search WWH ::

Custom Search

Home