Claudia Perlich - Data Scientists at Work

Database Reference

In-Depth Information

Gutierrez: What data goes into your predictive models now?

Perlich: What we have in terms of data is partial URL history of actual visita-

tions that we receive from data providers and bid requests from advertising

exchanges where we actually buy the impression. Unless a person has cookies

disabled, we can observe some of the sites the person has been to. Before we

use the URLs, we encode and hash them, as we are not interested in the web

page's content or what the actual URL was. What we care about is whether

a person's browsing history shows that they have or have not visited any one

of the millions URLs that the data stream contains.

From this, we now basically have a binary indicator for a millions of URLs for

any one person—actually cookie, as a person can use multiple computers and

more than one person could be using the same computer. Based on this data,

we can then predict whether the person will buy a product based on having

seen a couple thousand other people on the website of the product/brand

we are working with. This works very generically on any type of URL data,

because we do not have to rely on the data being meaningful for the hashing.

It could be a photo-sharing URL, it could be a video URL, it could be a blog

URL, it could be a retail site URL, or some other type of completely different

website URL. Because we use binary indicators, we are able to use this very

generic representation of the data that lends itself to all kinds of URL data

for the models. This is now the core value proposition and in some sense it

supports privacy as we are not interested in extracting meaningful behavior

patterns or link it back to a particular person.

Gutierrez: How do you and other data scientists fit into Dstillery?

Perlich: Right now, our team is about six-and-a-half data scientists out of

approximately a hundred people. At this point, the group is fairly large given

what we do and given the fact that we are in a startup. Over time the group

has grown into something that has a bit of a hierarchy, but it is still very flat.

We have a VP of data science who is formally in charge and has to deal with

managerial responsibilities. I, as chief scientist, do not have managerial respon-

sibilities, though I might, arguably, have some notional and intellectual leader-

ship when people have problems. They value my opinion and come to me.

This separation of responsibilities means I do not head a team per se—I just

get to pick and choose what I want to do. From time to time I might ask

people to do something for me, but I have never really enjoyed telling people

what to do.

One of the benefits of not having managerial responsibilities is that exchanges

with other data scientists are easier. I really like the eye-to-eye exchange,

where you can just bounce ideas off of one another and discuss various things.

It does not work very well if people feel like subordinates to you. I much

prefer to just reach out to someone and say, “Hey! Can we talk about this? I

Search WWH ::

Custom Search

Home