Database Reference
In-Depth Information
to connect data to people and actions that people might take brought me a
new sort of thrill that I hadn't had before. To me, discovering things about
people through their data was a really a cool thing. One thing led to another
and I worked on more and more analytic products and components.
Eventually Netflix called me. The very first time someone called me from Netflix,
I thought, “Oh my gosh. I'll bet their data is amazing! I would love to work on
Netflix data.” So I didn't have to think too hard about joining Netflix.
Gutierrez: What was the specific aha! moment where you thought person-
alization models made sense?
Smallwood: I think it was really during my time at Yahoo!, where for the first
time I had massive data at my fingertips. It's just so exciting to see how much
variety there is in the world. When you start looking at user-level profiles of
information—of pretty much any kind of user-generated data—and you're
aggregating at the user level to try to understand the head and the tail and the
incredible diversity within the human population, it's very obvious how differ-
ent people are. That to me is fascinating. How can you build things that can
satisfy the whole population? That's an exciting problem.
Gutierrez: When you came to Netflix, what was the first data set you
worked with?
Smallwood: The first data I worked with, about four years ago, was our view-
ing data, which was our largest data. At that point in time, my role was slightly
different than it is now. It was tilted slightly more toward the data engineering
side than it is now. The project involved an overhaul of the viewing data and the
data engineering behind it. At that point, even though the data was much smaller
than it is now, we could see the trajectory that we were on and we knew we
needed to redesign how that data was represented so that it could both scale
but still be granular enough for the things we wanted to study then and in the
future.
The project involved understanding all the data we were collecting at the
log level and what it included. This data set includes every segment of every
stream at every bit rate. So while you are watching a movie, the bit rate is flap-
ping around and we're serving you many, many streams that come together to
form one view that you see as the customer. So, it's just a tremendous amount
of data. We also collect every action you take—like pausing, rewinding, and
at what bit rate you are watching. This data set also includes your bandwidth
changes, network congestion, rebuffer events, or whatever else may happen
while you are trying to watch something on the service. As you can imagine,
this volume and detail makes things both daunting and fun.
Gutierrez: Internet entertainment as an industry is a little bit hard to nail
down. What are the main types of problems the industry is tackling?
 
Search WWH ::




Custom Search