Anna Smith - Data Scientists at Work

Database Reference

In-Depth Information

Gutierrez: What tools do you use to work with the data?

Smith: When I'm left to my own devices and I don't have to conform to any-

one else's stuff, I code in Python. Bitly was an all-Python shop, so that's where I

developed my Python capabilities. It was really nice to go in depth with Python,

and understand its specialties and how to make it clean and efficient with its

special tricks.

Here nobody else really writes in Python except for a few people on the data

team. Otherwise, people use whatever they want to on our team, so most of

it is SQL, just to get access to the data. And then most people outside of our

group crunch the data using Excel. Our recommendation engines are built on

R. The whole web site is in Java, so I've been learning a bit of that again since

undergrad. Then for special projects, we use more specialized tools such as

D3.js. For these projects or ideas, we balance getting results and learning new

tools. I've found that to really understand something, I need to physically do it

or write about it. Just reading about it doesn't quite cut it. I actually need the

muscle memory of working with it.

Gutierrez: What's a recent project you've worked on?

Smith : One of the projects we've been working on lately is combining all of the

different types of data that we self-collect. It involves not only combining them

but also figuring out an easy, fast, and robust way to replicate the process when

we want to add more data by, for instance, combining our pixel logs with Google

Analytics—which, somewhat unsurprisingly, is a headache. Validating it isn't

always so much fun, because I'm like, “What? It's data. It's right. It's correct”—

whether you parse it one way or another. That's just a different view of it. The

data is correct, it's just finding the right way to look at it and combine it.

We have a procedure that takes our pixel logs and puts them into HP'sVertica,

so then I created another script that pushes all our Google analytics into

Vertica. Now we can look at them together through Tableau. It's not really a

data science math project as opposed to a data science data cleanup project.

It's what you might call a bit more of the data engineering aspect of data sci-

ence. Once we have all of this data aligned, and everyone's happy, we've been

going through it and looking at the numbers. We can then focus on the next

steps of what else we can do now that we have these resources. We also

spend time thinking about what other data sets we can combine into this big

data set to make it even more valuable.

The main reason we started this project is that we just launched a mobile app, so

we've been working on understanding how our users use mobile devices, who

they are, and how they compare to our web audience. We are looking to better

understand questions like: How is the audience being distributed across all of

the devices? What does that mean as far as how they're interacting with the web

site? What are they doing on the app? Why aren't they buying on the app but

buying on the web site? Is it just a mental thing or is it a functionality thing?

Search WWH ::

Custom Search

Home