Database Reference
In-Depth Information
Gutierrez: What specific tools are you using?
Shellman: I'm writing a lot of Python these days, it's what all our recom-
mendation algorithms are written in. The Recommendo API is written in node
and hosted on AWS. We use a lot of open source libraries in Python, like
scikit-learn and pandas. As someone who used to work almost exclusively in
R, pandas is great because it's cheating in a way. It makes Python a lot like R,
so you get to code in Python but get a lot of the conveniences that we've all
come to expect from R. Of course, you'll also make yourself insane trying to
remember whether it's “len” or “length,” and 0 or 1 indexed.
Gutierrez: What is a specific project you have worked on recently?
Shellman: It's been a little while, but the beauty replenishment project was
a really fun one. The project started out as a tool for beauty stylists but
evolved into a personalized e-mail campaign. Initially we thought it would be
helpful for stylists to know when their clients were running low on product
so they could give them a call and remind them to come into the store. After
early feedback from the stylists that they likely wouldn't use a tool like that,
we found a home for the beauty replenishment work in a personalized email.
I started by analyzing active beauty customers, going through their beauty
transaction histories to understand what they purchased and then estimate
when they would be ready to replenish.
The biggest challenge was that beauty products have fast SKU [Stock Keeping
Unit] turnover. For example, say four months ago I bought lotion, and now
there's a new and improved formula. As a customer, when I replenish my
lotion, the new and improved formula is the same product that I bought four
months ago. However, from the manufacturer's perspective, it's a new SKU.
The issue is that if I didn't account for that SKU ancestry in my analysis I'd
miss a lot of replenishment purchases.
I used record linkage to solve the problem. Record linkage is a technique used to
find duplicates in things like census data and medical records. In survey data it's
typical to have typos and variations in name spellings and you want to link those
separate records into a single entry. I was doing the same thing—only instead of
names and addresses, I had brands, categories, and product descriptions. I forced
matching on things like product type and brand, and then used fuzzy string match-
ing to measure the similarity between product descriptions. My output was a
probability that two items were the same “record” for each candidate product.
Going in I didn't know that SKU turnover would be such a large part of the
project. I was green and not familiar with the product catalog and how SKUs
evolve. That made the project fun and challenging.
 
Search WWH ::




Custom Search