Extracting Meaning from Data - Doing Data Science

Databases Reference

In-Depth Information

p-values

In the context of regression where you're trying to estimate coef‐

ficients (the β s), to think in terms of p-values, you make an as‐

sumption of there being a null hypothesis that the β s are zero. For

any given β , the p-value captures the probability of observing the

data that you observed, and obtaining the test-statistic (in this case

the estimated β ) that you got under the null hypothesis . Specifi‐

cally, if you have a low p-value, it is highly unlikely that you would

observe such a test-statistic if the null hypothesis actually held.

This translates to meaning that (with some confidence) the coef‐

ficient is highly likely to be non-zero.

AIC (Akaike Infomation Criterion)

Given by the formula 2 k −2 ln L , where k is the number of pa‐

rameters in the model and ln L is the “maximized value of the

log likelihood.” The goal is to minimize AIC.

BIC (Bayesian Information Criterion)

Given by the formula k * ln n −2 ln L , where k is the number of

parameters in the model, n is the number of observations (data

points, or users), and ln L is the maximized value of the log like‐

lihood. The goal is to minimize BIC.

Entropy

This will be discussed more in “Embedded Methods: Decision

Trees” on page 184 .

In practice

As mentioned, stepwise regression is exploring a large space of all

possible models, and so there is the danger of overfitting—it will often

fit much better in-sample than it does on new out-of-sample data.

You don't have to retrain models at each step of these approaches,

because there are fancy ways to see how your objective function (aka

selection criterion) changes as you change the subset of features you

are trying out. These are called “finite differences” and rely essentially

on Taylor Series expansions of the objective function.

One last word: if you have a domain expert on hand, don't go into the

machine learning rabbit hole of feature selection unless you've tapped

into your expert completely!

Search WWH ::

Custom Search

Home