Kira Radinsky - Data Scientists at Work

Database Reference

In-Depth Information

the decision-making salespeople with approximately real-time data. This is

because if somebody just downloaded the white paper, or somebody changed

their title, or something happened to affect the sales process, or you have a

new person come into the sales funnel that you want to approach, the model

has to take into account this new information. Regardless of whether people

came in through the website, or were on a list that was bought in or came in

from some marketing material, we have to score them immediately. So it has

been a learning opportunity to deal with all of these moving variables.

Gutierrez: Have you faced any data challenges?

Radinsky: A data-specific challenge we've faced had to do with the size of the

data. There's big hype today around big data, but we are actually a small data

company. Sure, we collect data from a vast amount of sources, but customers

have a very small amount of information that is relevant for their potential

customers. So they can come with, let's say, two hundred potential customers,

though they have seen tens of thousands of noncustomers. These noncus-

tomers are still relevant for the business because they didn't buy and that is a

signal. Building a statistical model from the data of these two sets of groups is

very hard. More than that, it's completely unbalanced.

Gutierrez: Have you faced any modeling challenges?

Radinsky: There's a prevalent paradigm that data scientists take the data and

build a classifier and that it's just going to work. But it doesn't work that way,

and it's important for non-data scientists to realize that. I'm going to give you

the simplest example we've observed. In this example, we're trying to mimic a

sales process, which can be 6 to 12 stages, depending on our customer.

Of course, each of these steps in the sales process involves data. When we

receive the data, we are just getting a snapshot. For example, say you're try-

ing to sell to somebody and this person has answered a question regarding

whether he is happy or not after he became a customer. You're going to have

a few repeat customers, so you'll see them. So for this question of whether

the customer happy or not, it will have a value of yes or no for people who are

customers, and it's going to be empty for somebody who's not a customer.

So we get this data, we build this statistical model from historical past data,

and then the model focuses on this simple single rule. But then when we actu-

ally applied the algorithm in real time, it just said no about all these potential

people because they still didn't have this field filled in. What's going on here?

There's a process, and we're learning from a completely different stage of

the process. Not only that, we have to take into account that people don't

always fill out the forms correctly or even in a consistent manner. What we

ended up doing is mimicking completely the process of our customers and

trying to rebuild it from their data and building different types of classifiers

for each step.

Search WWH ::

Custom Search

Home