Database Reference
In-Depth Information
the decision-making salespeople with approximately real-time data. This is
because if somebody just downloaded the white paper, or somebody changed
their title, or something happened to affect the sales process, or you have a
new person come into the sales funnel that you want to approach, the model
has to take into account this new information. Regardless of whether people
came in through the website, or were on a list that was bought in or came in
from some marketing material, we have to score them immediately. So it has
been a learning opportunity to deal with all of these moving variables.
Gutierrez: Have you faced any data challenges?
Radinsky: A data-specific challenge we've faced had to do with the size of the
data. There's big hype today around big data, but we are actually a small data
company. Sure, we collect data from a vast amount of sources, but customers
have a very small amount of information that is relevant for their potential
customers. So they can come with, let's say, two hundred potential customers,
though they have seen tens of thousands of noncustomers. These noncus-
tomers are still relevant for the business because they didn't buy and that is a
signal. Building a statistical model from the data of these two sets of groups is
very hard. More than that, it's completely unbalanced.
Gutierrez: Have you faced any modeling challenges?
Radinsky: There's a prevalent paradigm that data scientists take the data and
build a classifier and that it's just going to work. But it doesn't work that way,
and it's important for non-data scientists to realize that. I'm going to give you
the simplest example we've observed. In this example, we're trying to mimic a
sales process, which can be 6 to 12 stages, depending on our customer.
Of course, each of these steps in the sales process involves data. When we
receive the data, we are just getting a snapshot. For example, say you're try-
ing to sell to somebody and this person has answered a question regarding
whether he is happy or not after he became a customer. You're going to have
a few repeat customers, so you'll see them. So for this question of whether
the customer happy or not, it will have a value of yes or no for people who are
customers, and it's going to be empty for somebody who's not a customer.
So we get this data, we build this statistical model from historical past data,
and then the model focuses on this simple single rule. But then when we actu-
ally applied the algorithm in real time, it just said no about all these potential
people because they still didn't have this field filled in. What's going on here?
There's a process, and we're learning from a completely different stage of
the process. Not only that, we have to take into account that people don't
always fill out the forms correctly or even in a consistent manner. What we
ended up doing is mimicking completely the process of our customers and
trying to rebuild it from their data and building different types of classifiers
for each step.
 
Search WWH ::




Custom Search