Database Reference
In-Depth Information
going on there, I only know that one of the main oil pipelines in the US is going
through Beebe, Arkansas, but for me, this was the first data adventure I took
to understand what was happening.
Gutierrez: Where did this project take you?
Radinsky: In thinking about what I had found and how I had found it, I real-
ized that the problem with these queries is that I only see what people want
to show me. That is, I only see the data for what people thought was interest-
ing enough to search for. I wouldn't see things like “oxygen depletion is causing
all the fish deaths.” From this I started thinking about how I could take all the
news that people ever wrote, to look for causality.
But first, I had to figure out how to define causality. I first started with things
like “x causes y.” I looked at newspaper articles and how somebody would
write something like “oil spills are caused by accidents.” So I started by look-
ing for all of the phrases that showed causality in that particular manner. Then
when I had those phrases, I started doing semantic analysis at a pragmatic level
to figure out who did the action, who it was done to, and so on. I scanned all
the newspapers I could find since 1851 until today. And from all of those, I
built what I called the “causality graph.” It's all the events and how they con-
nected based on the phrases.
From there, I quickly realized that it wasn't enough, because I needed to add
to it some kind of layer of abstraction. For instance, let's say you have an earth-
quake in Australia. If you look at the past, you've never had an earthquake in
Australia, so what is it going to do? You cannot predict from that. However,
there have been past earthquakes in Turkey, so the first thing I need to do is
know that an earthquake in Turkey and an earthquake in Australia are both
earthquakes and both are countries. How do I know that and have my mod-
els know that? What I did is go to Wikipedia and look for different ways of
abstracting of entities. In this case, I would know that you could say both of
these places are countries. With this added layer of abstraction, we could now
say I was closer to having solved this problem.
Next, let's say that I do have an earthquake in Australia. It finds a similar pat-
tern in the past, like an earthquake in Turkey. The system would look at what
that particular event caused in the past, and it would find something somebody
wrote in the news, like: “Red Cross help sent to Ankara after an earthquake in
Turkey.” So the system would look at that and would output “Red Cross help
sent to Ankara after an earthquake in Australia,” which is a problem, because
it's not the truth and it's not something that would actually happen.
The next thing I did was to adapt the prediction to what was going on in the
current event that we were looking at. This meant the system had to under-
stand that, in the past, the cause and effect happened because Ankara is the
capital of Turkey. Once it understood that, then it would apply this function to
what was going on now. Now the system would say, “Earthquake in Australia.
 
Search WWH ::




Custom Search