Database Reference
In-Depth Information
done is that we had added some new data from the exchanges. But once you
know how hard it is to predict human behavior, the fact that you doubled
performance—that is kind of scary. What ended up happening is that, yeah, it
had doubled the performance because there were bots committing fraud. We
were able to figure this out from the data because the bots were behaving
very deterministically.
The whole thing had started out as a pet project to figure out why we had
doubled performance, which is in some ways is a great thing. But we were just
skeptical enough not to believe it. It turned out that a whole bot network was
fooling us. Okay, fine, I am glad we talked about this and figured it out. From
that we then implemented a whole overhead system to watch for this. That
is very typical for these things. We start looking into some things and saying,
“Hey, there's something really surprising going on,” and typically, the answer's
completely different from where you started. That is the lesson learned: you
always have an open mind to the side things. I think it is a good skill set that
you are not just narrowing down, looking at the particular problem you are
looking at. Just keep your mind open and see what else is going on here. You
will typically find a lot more going on in that process.
Gutierrez: What makes a good data scientist?
Perlich: I think this is really the marksmanship of a good data scientist—you
have to have some amount of intuition about what should be happening. You
do not have to be a medical specialist to realize that the patient ID being pre-
dictive is a problem. It just takes some amount of common sense to observe
that. I think what this intuition develops with a lot of experience. You cannot
just make a data scientist out of a computer scientist or a mathematician
necessarily.
What I have observed is that there is a group of people who can embrace
uncertainty and noise and what it means. There is another group of people
who love to live in a deterministic black-and-white world. In a sense, they
believe that when you sort the list, it is sorted. And once the algorithm sorts
the list, it will always sort things right, because that is what it was made to do.
The algorithm is either correct or it is not, but you have a very clear metric
for correct.
Once you move to the side of data, the whole world develops a lot more gray
areas. It is actually very interesting for me to see the interactions with some of
our engineering team. Some get that, some actually figure this out, and some
just feel that they are done when they implemented the steps on the list. This
last group does not get the part that, once you have implemented the steps,
you have to start looking at the output to check whether the output makes
sense. “Makes sense” or “this should not really be happening” are not part of
these programmers' informal checklist.
 
Search WWH ::




Custom Search