Causality - Doing Data Science

Databases Reference

In-Depth Information

CHAPTER 11

Causality

Many of the models and examples in the topic so far have been focused

on the fundamental problem of prediction. We've discussed examples

like in Chapter 8 , where your goal was to build a model to predict

whether or not a person would be likely to prefer a certain item—a

movie or a book, for example. There may be thousands of features that

go into the model, and you may use feature selection to narrow those

down, but ultimately the model is getting optimized in order to get

the highest accuracy. When one is optimizing for accuracy, one doesn't

necessarily worry about the meaning or interpretation of the features,

and especially if there are thousands of features, it's well-near impos‐

sible to interpret at all.

Additionally, you wouldn't even want to make the statement that cer‐

tain characteristics caused the person to buy the item. So, for example,

your model for predicting or recommending a topic on Amazon could

include a feature “whether or not you've read Wes McKinney's O'Reilly

book Python for Data Analysis .” We wouldn't say that reading his topic

caused you to read this topic. It just might be a good predictor, which

would have been discovered and come out as such in the process of

optimizing for accuracy. We wish to emphasize here that it's not simply

the familiar correlation-causation trade-off you've perhaps had drilled

into your head already, but rather that your intent when building such

a model or system was not even to understand causality at all, but

rather to predict . And that if your intent were to build a model that

helps you get at causality, you would go about that in a different way.

A whole different set of real-world problems that actually use the same

statistical methods (logistic regression, linear regression) as part of the

Search WWH ::

Custom Search

Home