Databases Reference
In-Depth Information
You can't use this penalty term for large coefficients and as‐
sume the “weighting of the features” problem is still solved,
because in fact you'd be penalizing some coefficients way
more than others if they start out on different scales. The
easiest way to get around this is to normalize your variables
before entering them into the model, similar to how we did
it in Chapter 6 . If you have some reason to think certain
variables should have larger coefficients, then you can nor‐
malize different variables with different means and variances.
At the end of the day, the way you normalize is again equiv‐
alent to imposing a prior.
A final problem with this prior stuff: although the problem will have
a unique solution (as in the penalty will have a unique minimum) if
you make λ large enough, by that time you may not be solving the
problem you care about. Think about it: if you make λ absolutely huge,
then the coefficients will all go to zero and you'll have no model at all.
The Dimensionality Problem
OK, so we've tackled the overfitting problem, so now let's think about
overdimensionality—i.e., the idea that you might have tens of thou‐
sands of items. We typically use both Singular Value Decomposition
(SVD) and Principal Component Analysis (PCA) to tackle this, and
we'll show you how shortly.
To understand how this works before we dive into the math, let's think
about how we reduce dimensions and create “latent features” inter‐
nally every day. For example, people invent concepts like “coolness,”
but we can't directly measure how cool someone is. Other people ex‐
hibit different patterns of behavior, which we internally map or reduce
to our one dimension of “coolness.” So coolness is an example of a
latent feature in that it's unobserved and not measurable directly, and
we could think of it as reducing dimensions because perhaps it's a
combination of many “features” we've observed about the person and
implictly weighted in our mind.
Two things are happening here: the dimensionality is reduced into a
single feature and the latent aspect of that feature.
But in this algorithm, we don't decide which latent factors to care
about. Instead we let the machines do the work of figuring out what
the important latent features are. “Important” in this context means
Search WWH ::




Custom Search