Databases Reference
In-Depth Information
Filters
Filters order possible features with respect to a ranking based on a
metric or statistic, such as correlation with the outcome variable. This
is sometimes good on a first pass over the space of features, because
they then take account of the predictive power of individual features.
However, the problem with filters is that you get correlated features.
In other words, the filter doesn't care about redundancy. And by treat‐
ing the features as independent, you're not taking into account possible
interactions.
This isn't always bad and it isn't always good, as Isabelle Guyon ex‐
plains. On the one hand, two redundant features can be more powerful
when they are both used; and on the other hand, something that ap‐
pears useless alone could actually help when combined with another
possibly useless-looking feature that an interaction would capture.
Here's an example of a filter: for each feature, run a linear regression
with only that feature as a predictor. Each time, note either the p-value
or R-squared, and rank order according to the lowest p-value or high‐
est R-squared (more on these two in “Selection criterion” on page 182 ).
Wrappers
Wrapper feature selection tries to find subsets of features, of some fixed
size, that will do the trick. However, as anyone who has studied com‐
binations and permutations knows, the number of possible size k sub‐
sets of n things, called n k , grows exponentially . So there's a nasty
opportunity for overfitting by doing this.
There are two aspects to wrappers that you need to consider: 1) se‐
lecting an algorithm to use to select features and 2) deciding on a
selection criterion or filter to decide that your set of features is “good.”
Selecting an algorithm
Let's first talk about a set of algorithms that fall under the category of
stepwise regression , a method for feature selection that involves select‐
ing features according to some selection criterion by either adding or
subtracting features to a regression model in a systematic way. There
are three primary methods of stepwise regression: forward selection,
backward elimination, and a combined approach (forward and
backward).
Search WWH ::




Custom Search