Database Reference
In-Depth Information
for a certain problem. When we come to the conclusion that seems to be the
right technical solution, we then typically reimplement it and tune it toward
the exact setting we need. Then we have this as the in-house solution.
Gutierrez: What types of non-special-purpose tools do you use for data
analysis?
Perlich: We still do a lot with R, but that requires more kinds of downsam-
pling. You cannot run stochastic gradient descent on the data set sizes that we
want. Or at least I do not know how to do it, put it this way, so I will leave that
to others. Occasionally we do data visualizations, though mostly for commu-
nication purposes. There is nothing really to look at in our world of very high
dimensional models. For the data visualization, I have played around a little
bit with KML [Keyhole Markup Language] files for making maps. We also use
D3.js for our customer-facing side, where we actually show graphs of the stats
on the campaigns that we run. The consumer-facing side is more the analytics
team. So it is not so much the data science team that is involved in that part.
Gutierrez: What lessons have you learned from using these tools to trans-
form a toy project into a production system?
Perlich: I am not sure it falls under lessons for these tools specifically, but as
usual, when you start looking into a data set that nobody has paid that close
attention to, you end up finding things that you did not expect to see. For
instance, speaking about the instance of bids to performance issues, we real-
ized that on some inventories, meaning URLs, we were always paying what we
were bidding. Now, according to the rules, this is a second-prize auction. This
is not supposed to happen. So we found a couple of cases where we felt that
the way the billing system was set up was not necessarily correct. We also
found instances where we had just hard-coded the minimum bid price in the
wrong way. I guess the overarching lesson is that if nobody looks at a data set
for more than a month, it becomes useless pretty quickly because it is actually
almost totally wrong somewhere, so only regularly looked-at and worked-
with data sets are reliable.
Even if the project would have been a complete failure for all other reasons,
I think it found enough issues in our setup that it was very well worth having
me spend three days on it. This is something we realize again and again—side
observations and insights almost always add value beyond the primary pur-
pose. This kind of extra value happens very consistently whenever you look
at data.
Another stunning example of learning lessons from really looking at the data
is what happened when exploring a fraud case recently. My CTO came in and
said, “Look guys. You managed to double performance in the last two weeks
across all our campaigns. Do you have anything to tell me?” And we scratched
our heads, because we had not really done anything. The only thing we had
 
Search WWH ::




Custom Search