Database Reference
In-Depth Information
Gutierrez: What tools did you use to work with this data?
Smith: I was using Hadoop to store and compute on the data. So it was
stored on Amazon S3 and then we ran it through our MapReduce program.
We also used Elastic search so that we could have more data processed at
once. That was the core of the processing. Then once we had all of those
steps done, there was a light Python script that did all of the last-minute data
manipulation and pushed it out in a JSON format. The pipeline then loaded
this JSON into D3.js for the charts and presentation layer. I like to think it
was a clean pipeline, as it was something we worked really hard on. Though, as
these things tend to go as you learn more, it could have been better.
Gutierrez: Now you're at Rent the Runway. How does it compare to Bitly?
Smith: It's been really different, as they are two very different types of com-
panies. At Bitly, the company dealt with a more latent data source. People use
it and we see things happen. It was more about trying to capture people's
behaviors and understand what was going on in the Internet. At Rent the
Runway, it's a lot more of trying to support the business, so a lot of pure busi-
ness intelligence and business analytics. The problem here is trying to figure
out how we can put data into the product to drive business goals.
Gutierrez: How do you explain what you do to someone not familiar with
computer science, or physics, or data science?
Smith: Like, how do I tell my mom what I'm doing? Well, my mom's a bad
choice since she loves computers. Okay, how about—how would I tell my
sister? I would approach it as I'm solving problems with anything at my
disposal. It's like any job—instead of having court cases to litigate, like my sister,
I have problems that I need to solve. I just happen do it with data. Often times
that means I need to go to the engineers and ask them for information on
what people are doing on the website, and then I need to go to our databases
and find the dresses that are being rented, and then I need to combine what I
found out into a more refined form. This way, I can expose what's happening in
such a way that we can solve the problem we are seeking to understand.
Gutierrez: How would you describe your job to a physicist?
Smith: What I do is like solving any equation. You have inputs, you have out-
puts, and then there's a black box. You have to figure out the black box. I guess
it would be analogous to collapsing a waveform. In physics, there's a probability
of where a particle's going to be, and then what happens when you observe it
is that it goes to one spot. And so in data science, you have all these different
possibilities or all these different arrays of data, and you just want to collapse
it into one understandable piece of information that makes sense to the rest
of the world.
 
Search WWH ::




Custom Search