Database Reference
In-Depth Information
Gutierrez: What tools do you use to work with the data?
Smith: When I'm left to my own devices and I don't have to conform to any-
one else's stuff, I code in Python. Bitly was an all-Python shop, so that's where I
developed my Python capabilities. It was really nice to go in depth with Python,
and understand its specialties and how to make it clean and efficient with its
special tricks.
Here nobody else really writes in Python except for a few people on the data
team. Otherwise, people use whatever they want to on our team, so most of
it is SQL, just to get access to the data. And then most people outside of our
group crunch the data using Excel. Our recommendation engines are built on
R. The whole web site is in Java, so I've been learning a bit of that again since
undergrad. Then for special projects, we use more specialized tools such as
D3.js. For these projects or ideas, we balance getting results and learning new
tools. I've found that to really understand something, I need to physically do it
or write about it. Just reading about it doesn't quite cut it. I actually need the
muscle memory of working with it.
Gutierrez: What's a recent project you've worked on?
Smith : One of the projects we've been working on lately is combining all of the
different types of data that we self-collect. It involves not only combining them
but also figuring out an easy, fast, and robust way to replicate the process when
we want to add more data by, for instance, combining our pixel logs with Google
Analytics—which, somewhat unsurprisingly, is a headache. Validating it isn't
always so much fun, because I'm like, “What? It's data. It's right. It's correct”—
whether you parse it one way or another. That's just a different view of it. The
data is correct, it's just finding the right way to look at it and combine it.
We have a procedure that takes our pixel logs and puts them into HP'sVertica,
so then I created another script that pushes all our Google analytics into
Vertica. Now we can look at them together through Tableau. It's not really a
data science math project as opposed to a data science data cleanup project.
It's what you might call a bit more of the data engineering aspect of data sci-
ence. Once we have all of this data aligned, and everyone's happy, we've been
going through it and looking at the numbers. We can then focus on the next
steps of what else we can do now that we have these resources. We also
spend time thinking about what other data sets we can combine into this big
data set to make it even more valuable.
The main reason we started this project is that we just launched a mobile app, so
we've been working on understanding how our users use mobile devices, who
they are, and how they compare to our web audience. We are looking to better
understand questions like: How is the audience being distributed across all of
the devices? What does that mean as far as how they're interacting with the web
site? What are they doing on the app? Why aren't they buying on the app but
buying on the web site? Is it just a mental thing or is it a functionality thing?
 
Search WWH ::




Custom Search