Database Reference
In-Depth Information
At Netflix, it's quite the opposite, from the standpoint that we are very inten-
tional about what data we capture and how we capture it. We try to embed
any critical business logic that we know we're going to want to apply, no mat-
ter what we do with the data, at the point of capture. This makes just about
everything much easier. Sometimes we get that wrong. And more work falls
to the engineering teams when that's the case because then they have to go
back, detangle what we did, and rework something new, which can be painful.
But you're going to have that no matter what at some layer of the data stack,
and we have found that it's easier to do that at the source where you can.
Now, you can't always do that, because you want granular data that you can
aggregate in as many ways as you want to later when you think of new ideas.
But there are certain things you know that you really don't need. We try to
weed out as strongly as possible in things, but you're never one hundred per-
cent right. You're occasionally going to have to go back and rework things.
You just want to try to minimize that.
Gutierrez: How do you think about the technology selection for the data stack?
Smallwood: This is a hard one because technology, especially in the data
space, evolves more quickly than most companies can evolve. This is true
especially at the data warehousing level, whether it's in the cloud or dedicated
warehousing. There are so many different broad mechanisms, and once you've
built a lot of infrastructure within your company, it's incredibly expensive to
switch over to some new technology.
We use Teradata for a large part of our data warehousing. If we wanted to
move from Teradata to some other data-center-oriented warehousing sys-
tem, we would have so much to move that it would be a year's worth of work
for the entire data organization. Perhaps not quite that much, but it would be
a lot of work. So the farther upstream you are in your stack, the harder it is, I
think, to change technologies.
That said, we have been, like many companies, moving more and more toward
cloud-based analytics. When I first started at Netflix, pretty much all of our
data was in Teradata and we had a little bit of data in the cloud. Netflix was
just moving toward serving our whole product off of the cloud, so as you can
imagine, that meant we started having more and more data in the cloud. With
this evolution of the product occurring, what has worked for us is doing par-
allel development. We've moved from having the majority of our analytics in
Teradata to now having the majority of our analytics in the cloud.
We do still have a formidable amount of data in Teradata, but we've switched
our philosophy. We have aggregate data we use for ongoing reporting in
Teradata. We have granular data we use more for the modeling in the cloud,
and all sorts of analytics go in both places. But we have more data in the cloud,
as it's closer to the point of capture. And it's worked really nicely for us to
 
Search WWH ::




Custom Search