Caitlin Smallwood - Data Scientists at Work

Database Reference

In-Depth Information

Smallwood: I would say hands down it's A/B testing. When I was first

exposed to it many years ago as “web analytics”, I didn't have much of a level

of respect for it. It seemed very straightforward and trite to me. I didn't really

get the power of it until I came to Netflix. What's interesting to me is that

people's intuition is wrong so often, even when you're an expert in an area.

Even when you have a lot of domain and background in an area, your intuition

to answer any one question is still wrong sometimes.

Similarly, with predictive models we get excited when we see an AUC of 0.75.

We're like, “Oh my gosh! That's a great model!” But it's still wrong a lot of the

time, and it's only through experimentation that you can actually get a causal

read on something. And it is so fun and fascinating to watch. We get to do

both of these things at Netflix: We get to develop the alternate algorithm, and

then we get to test it. We may initially think, “This algorithm will be so much

better than what we had before! It has an amazing AUC.” Or whatever we

measured offline—perhaps MRR [Mean Reciprocal Rank]. And then we test

the model and it's instead worse than what we had before. There will be one

surprising result after another. And to me, it's the power of getting that real

causality that just completely rocked my world of thinking about data.

Gutierrez: Do you codify what you learn from each A/B test?

Smallwood: We definitely try to look at themes of things we learned across

the tests, but the focus is more on where else we can do testing that we're not

doing yet. We would love to test in the content space to learn more about the

titles and catalog makeup that are most important to our customers, but we

don't want to test things that are a negative experience for customers. So we

haven't and won't do that. We've debated minimal experiments like, “What if

we just took one title out of our library and tried to see if we could measure

an impact?” Still, not only do we have contractual agreements with the studios,

but we also don't want to degrade the experience for our customers. But we

do always think about whether there is anything we could do to experiment

in the content space to help inform those decisions.

Gutierrez: Do you do A/B testing in your personal life?

Smallwood: Not quite A/B testing, but when I was young, I used to build

little models for my personal decisions. Dorky little models to make personal

decisions. I'd figure out what all the attributes were about decision A versus

decision B, and come up with my weights on how important each of those

attributes were.

Gutierrez: You have mathematicians, operations researchers, statisticians,

and data scientists in your group. How do you classify them?

Search WWH ::

Custom Search

Home