Database Reference
In-Depth Information
Gutierrez: What data goes into your predictive models now?
Perlich: What we have in terms of data is partial URL history of actual visita-
tions that we receive from data providers and bid requests from advertising
exchanges where we actually buy the impression. Unless a person has cookies
disabled, we can observe some of the sites the person has been to. Before we
use the URLs, we encode and hash them, as we are not interested in the web
page's content or what the actual URL was. What we care about is whether
a person's browsing history shows that they have or have not visited any one
of the millions URLs that the data stream contains.
From this, we now basically have a binary indicator for a millions of URLs for
any one person—actually cookie, as a person can use multiple computers and
more than one person could be using the same computer. Based on this data,
we can then predict whether the person will buy a product based on having
seen a couple thousand other people on the website of the product/brand
we are working with. This works very generically on any type of URL data,
because we do not have to rely on the data being meaningful for the hashing.
It could be a photo-sharing URL, it could be a video URL, it could be a blog
URL, it could be a retail site URL, or some other type of completely different
website URL. Because we use binary indicators, we are able to use this very
generic representation of the data that lends itself to all kinds of URL data
for the models. This is now the core value proposition and in some sense it
supports privacy as we are not interested in extracting meaningful behavior
patterns or link it back to a particular person.
Gutierrez: How do you and other data scientists fit into Dstillery?
Perlich: Right now, our team is about six-and-a-half data scientists out of
approximately a hundred people. At this point, the group is fairly large given
what we do and given the fact that we are in a startup. Over time the group
has grown into something that has a bit of a hierarchy, but it is still very flat.
We have a VP of data science who is formally in charge and has to deal with
managerial responsibilities. I, as chief scientist, do not have managerial respon-
sibilities, though I might, arguably, have some notional and intellectual leader-
ship when people have problems. They value my opinion and come to me.
This separation of responsibilities means I do not head a team per se—I just
get to pick and choose what I want to do. From time to time I might ask
people to do something for me, but I have never really enjoyed telling people
what to do.
One of the benefits of not having managerial responsibilities is that exchanges
with other data scientists are easier. I really like the eye-to-eye exchange,
where you can just bounce ideas off of one another and discuss various things.
It does not work very well if people feel like subordinates to you. I much
prefer to just reach out to someone and say, “Hey! Can we talk about this? I
 
Search WWH ::




Custom Search