Claudia Perlich - Data Scientists at Work

Database Reference

In-Depth Information

Perlich: When we interviewed Melinda, we looked at a project we had

recently worked on. We said, “We have to optimize Nielsen reports.” Nielsen

is one of the companies that provide feedback on advertising campaigns. For

instance, they may tell you that of all the ads that you showed, females saw

73 percent of the ads. The interesting part about this is Nielsen has some

internal panel. That panel does not cover all the people you showed ads to

but just some subset. Part of this panel is then matched against Facebook.

Then they figure out from this percentage that was on Facebook which ones

self-identified as being female. Whether or not they are female is a separate

question. But this is the basis of the report that tells you that females saw 73

percent of the ads. So the data is some subset of some subset and it is hard to

tell whether ultimately it is a representative sample of my ads. And now the

problem is that I am then supposed to optimize this without having any access

to any of the underlying data. But because it is not a predictive model, for no

instance/person do I get the answer. I only get aggregate feedback on sets of a

hundred thousand impressions.

This is a problem we had been working and thinking about recently. Internally,

we had brainstormed about it and had basically developed a methodology.

So when we interviewed Melinda, we asked her questions like: “How can you

optimize it?” and “How can you build a model to optimize for females, if this

is what you want.” This is not something we typically want, but we wanted to

hear her thought process. We said, “Tell us what to do about it. You have an

hour. Ask questions if you want to. This is a problem we are working on right

now.” It was quite interesting to have this conversation.

Gutierrez: How did Melinda approach the problem?

Perlich: Melinda went into probability theory, saying, “You have one group

that is 80 percent female. This other group is 70 percent female. The inter-

section: Should it be higher than 70 percent or should it be lower? Is the fact

that you show up in both of them increasing my belief that you are female, or

decreasing it?”

So we discussed how to go at this problem with the Bayesian theory of prob-

abilities—in particular, where it was possible to assume independence versus

overlapping, and so on. Ultimately this idea was not what we implemented

since the overlap was not sufficient. But we did take some of the ideas for-

ward and made it into a predictive modeling task: “Well, let's use it to ran-

domly label examples. Let's get a whole bunch of those that Nielsen thinks are

female. If they say it's 80 percent, then we will label these things as female with

80 percent probability.” We did this for all kinds of segments and then actually

built a model on it. So we faked the outcome, and built the model based on

probabilities, which is, in fact, what we ended up building.

Gutierrez: The interview is basically a working session then.

Search WWH ::

Custom Search

Home