Database Reference
In-Depth Information
Gutierrez: Does everybody use Python for prototyping?
Lenaghan: On the prototype building side, we use Python and scikit-learn,
the Python machine learning library, a great deal. A lot of the other guys on
the team use R, especially those that come from more of a statistics back-
ground, as they are very proficient in R. Then we also have the guys who came
from more of the finance side, so they still write a lot of Java.
Gutierrez: Is data munging a big part of your work, and if so, what tools do
you use?
Lenaghan: When it comes to munging, it is definitely true, even for me,
that 80 percent of the work I do is munging data. When I worked in finance,
I learned to do that very quickly and efficiently in Perl. Since I started at
PlaceIQ I have not used Perl. Now I do all of the data munging in Python.
Gutierrez: Is data visualization a big part of your work and, if so, what tools
do you use?
Lenaghan: Even though I use Python for pretty much everything, I do not use
any visualization tools in Python. I know that matplotlib is great and it looks
great. It is just that I haven't invested the time so that it just sort of flows out
of my fingers. So to visualize data, we use a variety of other tools.
Geospatial visualization is a giant, hairy, terrible problem. We do have our own
geospatial visualization program that we use internally, which works well. But
for anything else that is not geospatial, I use R and ggplot2. I use R for every-
thing else because it is what I am familiar with, everything looks beautiful, it
works very well, and it is extremely functional. I can show it to people on the
sales side and they like it. Amusingly, they still take the data, put it into Excel,
and make their own plots with it.
Gutierrez: Tell me about a specific project that you have worked on. Take
me through the thinking behind the project, how you built it, and what lessons
you learned.
Lenaghan: First, let's talk about the location targeting before we cover the
project, so we have a base of understanding. Before I came to the PlaceIQ,
the geospatial layer had been built out fairly well. Duncan McCall and Steve
Milton, as well as the early employees of the company, had very clear and very
good ideas about how to tackle geospatial at scale. The big idea was that you
wanted to tame the spatial dimension by keying everything in terms of the
100-meter-by-100-meter tiles. No matter what data you have, it had to be
attributed to a tile.
Gutierrez: Every kind of data had to be keyed into these tiles?
Lenaghan: Every kind of spatial data. For temporal data, we divide the week
up into 26 time periods that are culturally relevant, so that allows us to
not have to worry about the clock time. For instance, your Tuesday A.M.
 
Search WWH ::




Custom Search