Claudia Perlich - Data Scientists at Work

Database Reference

In-Depth Information

done is that we had added some new data from the exchanges. But once you

know how hard it is to predict human behavior, the fact that you doubled

performance—that is kind of scary. What ended up happening is that, yeah, it

had doubled the performance because there were bots committing fraud. We

were able to figure this out from the data because the bots were behaving

very deterministically.

The whole thing had started out as a pet project to figure out why we had

doubled performance, which is in some ways is a great thing. But we were just

skeptical enough not to believe it. It turned out that a whole bot network was

fooling us. Okay, fine, I am glad we talked about this and figured it out. From

that we then implemented a whole overhead system to watch for this. That

is very typical for these things. We start looking into some things and saying,

“Hey, there's something really surprising going on,” and typically, the answer's

completely different from where you started. That is the lesson learned: you

always have an open mind to the side things. I think it is a good skill set that

you are not just narrowing down, looking at the particular problem you are

looking at. Just keep your mind open and see what else is going on here. You

will typically find a lot more going on in that process.

Gutierrez: What makes a good data scientist?

Perlich: I think this is really the marksmanship of a good data scientist—you

have to have some amount of intuition about what should be happening. You

do not have to be a medical specialist to realize that the patient ID being pre-

dictive is a problem. It just takes some amount of common sense to observe

that. I think what this intuition develops with a lot of experience. You cannot

just make a data scientist out of a computer scientist or a mathematician

necessarily.

What I have observed is that there is a group of people who can embrace

uncertainty and noise and what it means. There is another group of people

who love to live in a deterministic black-and-white world. In a sense, they

believe that when you sort the list, it is sorted. And once the algorithm sorts

the list, it will always sort things right, because that is what it was made to do.

The algorithm is either correct or it is not, but you have a very clear metric

for correct.

Once you move to the side of data, the whole world develops a lot more gray

areas. It is actually very interesting for me to see the interactions with some of

our engineering team. Some get that, some actually figure this out, and some

just feel that they are done when they implemented the steps on the list. This

last group does not get the part that, once you have implemented the steps,

you have to start looking at the output to check whether the output makes

sense. “Makes sense” or “this should not really be happening” are not part of

these programmers' informal checklist.

Search WWH ::

Custom Search

Home