Database Reference
In-Depth Information
Smallwood: I would say hands down it's A/B testing. When I was first
exposed to it many years ago as “web analytics”, I didn't have much of a level
of respect for it. It seemed very straightforward and trite to me. I didn't really
get the power of it until I came to Netflix. What's interesting to me is that
people's intuition is wrong so often, even when you're an expert in an area.
Even when you have a lot of domain and background in an area, your intuition
to answer any one question is still wrong sometimes.
Similarly, with predictive models we get excited when we see an AUC of 0.75.
We're like, “Oh my gosh! That's a great model!” But it's still wrong a lot of the
time, and it's only through experimentation that you can actually get a causal
read on something. And it is so fun and fascinating to watch. We get to do
both of these things at Netflix: We get to develop the alternate algorithm, and
then we get to test it. We may initially think, “This algorithm will be so much
better than what we had before! It has an amazing AUC.” Or whatever we
measured offline—perhaps MRR [Mean Reciprocal Rank]. And then we test
the model and it's instead worse than what we had before. There will be one
surprising result after another. And to me, it's the power of getting that real
causality that just completely rocked my world of thinking about data.
Gutierrez: Do you codify what you learn from each A/B test?
Smallwood: We definitely try to look at themes of things we learned across
the tests, but the focus is more on where else we can do testing that we're not
doing yet. We would love to test in the content space to learn more about the
titles and catalog makeup that are most important to our customers, but we
don't want to test things that are a negative experience for customers. So we
haven't and won't do that. We've debated minimal experiments like, “What if
we just took one title out of our library and tried to see if we could measure
an impact?” Still, not only do we have contractual agreements with the studios,
but we also don't want to degrade the experience for our customers. But we
do always think about whether there is anything we could do to experiment
in the content space to help inform those decisions.
Gutierrez: Do you do A/B testing in your personal life?
Smallwood: Not quite A/B testing, but when I was young, I used to build
little models for my personal decisions. Dorky little models to make personal
decisions. I'd figure out what all the attributes were about decision A versus
decision B, and come up with my weights on how important each of those
attributes were.
Gutierrez: You have mathematicians, operations researchers, statisticians,
and data scientists in your group. How do you classify them?
 
Search WWH ::




Custom Search