Databases Reference
In-Depth Information
CHAPTER 11
Causality
Many of the models and examples in the topic so far have been focused
on the fundamental problem of prediction. We've discussed examples
like in Chapter 8 , where your goal was to build a model to predict
whether or not a person would be likely to prefer a certain item—a
movie or a book, for example. There may be thousands of features that
go into the model, and you may use feature selection to narrow those
down, but ultimately the model is getting optimized in order to get
the highest accuracy. When one is optimizing for accuracy, one doesn't
necessarily worry about the meaning or interpretation of the features,
and especially if there are thousands of features, it's well-near impos‐
sible to interpret at all.
Additionally, you wouldn't even want to make the statement that cer‐
tain characteristics caused the person to buy the item. So, for example,
your model for predicting or recommending a topic on Amazon could
include a feature “whether or not you've read Wes McKinney's O'Reilly
book Python for Data Analysis .” We wouldn't say that reading his topic
caused you to read this topic. It just might be a good predictor, which
would have been discovered and come out as such in the process of
optimizing for accuracy. We wish to emphasize here that it's not simply
the familiar correlation-causation trade-off you've perhaps had drilled
into your head already, but rather that your intent when building such
a model or system was not even to understand causality at all, but
rather to predict . And that if your intent were to build a model that
helps you get at causality, you would go about that in a different way.
A whole different set of real-world problems that actually use the same
statistical methods (logistic regression, linear regression) as part of the
 
Search WWH ::




Custom Search