Claudia Perlich - Data Scientists at Work

Database Reference

In-Depth Information

for a certain problem. When we come to the conclusion that seems to be the

right technical solution, we then typically reimplement it and tune it toward

the exact setting we need. Then we have this as the in-house solution.

Gutierrez: What types of non-special-purpose tools do you use for data

analysis?

Perlich: We still do a lot with R, but that requires more kinds of downsam-

pling. You cannot run stochastic gradient descent on the data set sizes that we

want. Or at least I do not know how to do it, put it this way, so I will leave that

to others. Occasionally we do data visualizations, though mostly for commu-

nication purposes. There is nothing really to look at in our world of very high

dimensional models. For the data visualization, I have played around a little

bit with KML [Keyhole Markup Language] files for making maps. We also use

D3.js for our customer-facing side, where we actually show graphs of the stats

on the campaigns that we run. The consumer-facing side is more the analytics

team. So it is not so much the data science team that is involved in that part.

Gutierrez: What lessons have you learned from using these tools to trans-

form a toy project into a production system?

Perlich: I am not sure it falls under lessons for these tools specifically, but as

usual, when you start looking into a data set that nobody has paid that close

attention to, you end up finding things that you did not expect to see. For

instance, speaking about the instance of bids to performance issues, we real-

ized that on some inventories, meaning URLs, we were always paying what we

were bidding. Now, according to the rules, this is a second-prize auction. This

is not supposed to happen. So we found a couple of cases where we felt that

the way the billing system was set up was not necessarily correct. We also

found instances where we had just hard-coded the minimum bid price in the

wrong way. I guess the overarching lesson is that if nobody looks at a data set

for more than a month, it becomes useless pretty quickly because it is actually

almost totally wrong somewhere, so only regularly looked-at and worked-

with data sets are reliable.

Even if the project would have been a complete failure for all other reasons,

I think it found enough issues in our setup that it was very well worth having

me spend three days on it. This is something we realize again and again—side

observations and insights almost always add value beyond the primary pur-

pose. This kind of extra value happens very consistently whenever you look

at data.

Another stunning example of learning lessons from really looking at the data

is what happened when exploring a fraud case recently. My CTO came in and

said, “Look guys. You managed to double performance in the last two weeks

across all our campaigns. Do you have anything to tell me?” And we scratched

our heads, because we had not really done anything. The only thing we had

Search WWH ::

Custom Search

Home