Jonathan Lenaghan - Data Scientists at Work

Database Reference

In-Depth Information

commute is contextually the same as your Thursday A.M. commute, and

Sunday lunch is always Sunday lunch.

We also have a very sophisticated ontology/taxonomy that we use internally.

All of our data and all of our categories of this data get mapped to this

ontology. So this framework that was built out is very sophisticated. It actu-

ally makes scaling much easier to do because you are not trying to boil the

whole ocean.

Gutierrez: So this is the background to the project.

Lenaghan: Correct: this was our location targeting. The big project I want

to talk about, which was important to the company, was what we call our

Audience product line. I briefly covered this earlier. The Audience product

line is our device targeting offering, as opposed to our location targeting.

When I came here, we started to think, “So we're targeting location, which is

great. Location histories are going to be even better.” And so this was taking

the ad-request logs and joining them with the geospatial layer that had already

been built.

Gutierrez: What was the first step in this project?

Lenaghan: We started by writing a query language that allowed us to create

profiles and audiences out of the ad-request logs joined with the geospatial

data layer. The first Audience we wanted to build was air travelers, which

meant we wanted to be able to look at all the location histories of devices

that had been observed in an airport. This was actually an enormous project.

It started off in fits, and there were a lot of things that did not scale so well.

We started off trying to build an Air Traveler audience by finding points in

polygons across the United States. As a first step, we started off by using the

polygons of airports. It is a very complicated computational geometry prob-

lem to find points in polygons mathematically [point-in-polygon problem].

There are fast ways to do it, but sort of generically. The canned ways you find

to do it are extremely slow. This approach just did not scale, it was really slow,

and it produced terrible results.

Gutierrez: How did you solve it?

Lenaghan: We solved it by tilizing our polygons. You still capture and map

data to these tiles. It's just that—especially for larger polygons, like Walmart

and airports and similar giant structure—the error that you have is small

once you tilize it. Once you work at the tile level, everything becomes kind of

abstract again. You have all these keys, and you are doing large key-value joints.

I wrote the first framework to do that work.

Once we had the audience, the next part of the project was figuring out the

demographics of that audience. You are able to make particular anonymized

inferences about the demographics according to where people happen to be.

Search WWH ::

Custom Search

Home