Kira Radinsky - Data Scientists at Work

Database Reference

In-Depth Information

going on there, I only know that one of the main oil pipelines in the US is going

through Beebe, Arkansas, but for me, this was the first data adventure I took

to understand what was happening.

Gutierrez: Where did this project take you?

Radinsky: In thinking about what I had found and how I had found it, I real-

ized that the problem with these queries is that I only see what people want

to show me. That is, I only see the data for what people thought was interest-

ing enough to search for. I wouldn't see things like “oxygen depletion is causing

all the fish deaths.” From this I started thinking about how I could take all the

news that people ever wrote, to look for causality.

But first, I had to figure out how to define causality. I first started with things

like “x causes y.” I looked at newspaper articles and how somebody would

write something like “oil spills are caused by accidents.” So I started by look-

ing for all of the phrases that showed causality in that particular manner. Then

when I had those phrases, I started doing semantic analysis at a pragmatic level

to figure out who did the action, who it was done to, and so on. I scanned all

the newspapers I could find since 1851 until today. And from all of those, I

built what I called the “causality graph.” It's all the events and how they con-

nected based on the phrases.

From there, I quickly realized that it wasn't enough, because I needed to add

to it some kind of layer of abstraction. For instance, let's say you have an earth-

quake in Australia. If you look at the past, you've never had an earthquake in

Australia, so what is it going to do? You cannot predict from that. However,

there have been past earthquakes in Turkey, so the first thing I need to do is

know that an earthquake in Turkey and an earthquake in Australia are both

earthquakes and both are countries. How do I know that and have my mod-

els know that? What I did is go to Wikipedia and look for different ways of

abstracting of entities. In this case, I would know that you could say both of

these places are countries. With this added layer of abstraction, we could now

say I was closer to having solved this problem.

Next, let's say that I do have an earthquake in Australia. It finds a similar pat-

tern in the past, like an earthquake in Turkey. The system would look at what

that particular event caused in the past, and it would find something somebody

wrote in the news, like: “Red Cross help sent to Ankara after an earthquake in

Turkey.” So the system would look at that and would output “Red Cross help

sent to Ankara after an earthquake in Australia,” which is a problem, because

it's not the truth and it's not something that would actually happen.

The next thing I did was to adapt the prediction to what was going on in the

current event that we were looking at. This meant the system had to under-

stand that, in the past, the cause and effect happened because Ankara is the

capital of Turkey. Once it understood that, then it would apply this function to

what was going on now. Now the system would say, “Earthquake in Australia.

Search WWH ::

Custom Search

Home