Jonathan Lenaghan - Data Scientists at Work

Database Reference

In-Depth Information

However, in-home and out-of-home locations are very different and will give

different demographic results. So in order to get this right, we had to build a

classifier for “what does it mean for a tile to be residential.”

Gutierrez: Sounds deceptively simple. There must have been more than a

few stumbling blocks. What were they?

Lenaghan: It does sound really easy. You look at the map and search for

a house. Once you see a house, you know the tile is residential, so you are

able to get demographic results. However, doing this across the one billion

tiles in the United States means that you have to do that programmatically

somehow. The power of the classifier comes from being able to designate a

tile as residential or nonresidential. So this was an important step to figure

out. Unfortunately, there is not a good data set that says, “This particular tile

is residential.”

Gutierrez: How did you develop the data set to tell you if a tile was

residential?

Lenaghan: We used a lot of different data sets, including a lot of ad-request

data, and tried a lot of different features to figure out where the residences

were. Again, sounds straightforward, but it was not straightforward at all. As

an example of why we had to use multiple data sets, the census data does not

work because the census data is defined in terms of census blocks, which are

enormous. So if you were to just use census data as your residential signal, you

would have a residential signal essentially everywhere in the United States.

Gutierrez: Tell me about the classifier you developed.

Lenaghan: The classifier we came up with had about sixteen features that

indicated whether or not the tile was residential. We then had to finish build-

ing out this very high-quality residential classifier. Once we had that, we could

figure out from all these location histories what demographic attributes to

give the Air Traveler audience.

Now we have these in-home and out-of-home components of the audience,

which give us a base data layer for building any sort of movement profile that

we would want. So we can now combine “a device that tends to be in house-

holds with this particular demographic” with “a device tends to dwell in coffee

shops and has been observed on an auto lot for a particular brand.”

Gutierrez: Is this where the query language comes in?

Lenaghan: Yes. Now that we have the data and the classifier, we then have

to build up the query language to help us create the types of audiences we

wanted. This means the query language has to be able to write these rules and

has to be able to hook into the geospatial base data layer to pull out these

audiences.

Search WWH ::

Custom Search

Home