Recommendation Engines: Building a User-Facing Data Product at Scale - Doing Data Science

Databases Reference

In-Depth Information

You can't use this penalty term for large coefficients and as‐

sume the “weighting of the features” problem is still solved,

because in fact you'd be penalizing some coefficients way

more than others if they start out on different scales. The

easiest way to get around this is to normalize your variables

before entering them into the model, similar to how we did

it in Chapter 6 . If you have some reason to think certain

variables should have larger coefficients, then you can nor‐

malize different variables with different means and variances.

At the end of the day, the way you normalize is again equiv‐

alent to imposing a prior.

A final problem with this prior stuff: although the problem will have

a unique solution (as in the penalty will have a unique minimum) if

you make λ large enough, by that time you may not be solving the

problem you care about. Think about it: if you make λ absolutely huge,

then the coefficients will all go to zero and you'll have no model at all.

The Dimensionality Problem

OK, so we've tackled the overfitting problem, so now let's think about

overdimensionality—i.e., the idea that you might have tens of thou‐

sands of items. We typically use both Singular Value Decomposition

(SVD) and Principal Component Analysis (PCA) to tackle this, and

we'll show you how shortly.

To understand how this works before we dive into the math, let's think

about how we reduce dimensions and create “latent features” inter‐

nally every day. For example, people invent concepts like “coolness,”

but we can't directly measure how cool someone is. Other people ex‐

hibit different patterns of behavior, which we internally map or reduce

to our one dimension of “coolness.” So coolness is an example of a

latent feature in that it's unobserved and not measurable directly, and

we could think of it as reducing dimensions because perhaps it's a

combination of many “features” we've observed about the person and

implictly weighted in our mind.

Two things are happening here: the dimensionality is reduced into a

single feature and the latent aspect of that feature.

But in this algorithm, we don't decide which latent factors to care

about. Instead we let the machines do the work of figuring out what

the important latent features are. “Important” in this context means

Search WWH ::

Custom Search

Home