Databases Reference
In-Depth Information
feature vector x i . In the case of measuring the likelihood of an average
user clicking on an ad, the base rate would correspond to the click-
through rate, i.e., the tendency over all users to click on ads. This is
typically on the order of 1%.
If you had no information about your specific situation except the base
rate, the average prediction would be given by just α :
1
1 + e α
P c i = 1 =
The variable β defines the slope of the logit function. Note that in
general it's a vector that is as long as the number of features you are
using for each data point. The vector β determines the extent to which
certain features are markers for increased or decreased likelihood to
click on an ad.
Estimating α and β
Your immediate modeling goal is to use the training data to find the
best choices for α and β . In general you want to solve this with max‐
imum likelihood estimation and use a convex optimization algorithm
because the likelihood function is convex; you can't just use derivatives
and vector calculus like you did with linear regression because it's a
complicated function of your data, and in particular there is no closed-
form solution.
Denote by Θ the pair α , β . The likelihood function L is defined by:
L Θ X 1 , X 2 ,⋯, X n
= P X Θ = P X 1 Θ ·⋯· P X n Θ
where you are assuming the data points X i are independent, where
i = 1, . . . , n represent your n users. This independence assumption cor‐
responds to saying that the click behavior of any given user doesn't
affect the click behavior of all the other users—in this case, “click be‐
havior” means “probability of clicking.” It's a relatively safe assumption
at a given point in time, but not forever. (Remember the independence
assumption is what allows you to express the likelihood function as
the product of the densities for each of the n observations.)
You then search for the parameters that maximize the likelihood, hav‐
ing observed your data:
Search WWH ::




Custom Search