Database Reference
In-Depth Information
center facility. The colo will help in storing location data that is very sensitive.
Technically, all of the data will be stored in Apache's Hadoop Distributed File
System [HDFS].
Gutierrez: As your team expands, what types of people are you looking for
and how do you actually know that they are good?
Lenaghan: When we are looking for people, we are looking for very pas-
sionate people who are quantitatively minded. Even though we use Hadoop
a lot here, being an expert in Hadoop is not a job requirement. We want
people who can think logically, scientifically, and quantitatively about problems.
We want them to be able to accurately identify what works and does not
work. We also want them to know why things do not work, even though they
thought they were going to work. Being self-critical is important.
Our interview process consists more of probing to understand how they
think rather than, “How would you do this particular graph algorithm in a
map-reduce framework?” We are interested more in raw skills than in par-
ticular skills for our data science team. Whether we are hiring a junior hire or
a senior hire, we are looking for that quantitative piece. We have hired people
on the junior level who have very little programming/software engineering
experience. They had to learn those skills on the job and now they are writing
fantastic code. So hiring based on raw ability rather than specific experience
has not been a problem at all. That said, we occasionally need a very special-
ized person for a very specialized task, but that is the exception to our usual
hiring practices.
Gutierrez: Are there any tools not currently in your workflow that you are
excited about?
Lenaghan: One of the technologies we are looking at is Julia. One of the
projects a particular guy on the data science team is working on is figuring
out where we can use Julia in our workflow. Right now, because we are on
Amazon, we pay for the compute time. So we definitely want to cut down our
compute costs as much as possible. Once we move into the colo, it will be less
of a concern, but we still want to cut down our compute times.
We run many processes hundreds of billions of times a month. When you are
running algorithms on ad-request logs, even something as simple as convert-
ing from a latitude and longitude to a tile makes a big difference in compute
times and costs. Making these types of very small changes is important in our
work, so we are always looking for more performant numerical techniques.
Julia looks very promising in this area, so that is why we have a person working
on figuring out how to include it in our workflow.
I would also like to learn more about Clojure. I think the fewer lines of code
that you have to write, the better. Just looking at some Clojure projects, it
seems very promising to me. Functional programming languages lend them-
selves very well to things we do a great deal of—such as distributed computing,
 
Search WWH ::




Custom Search