Kira Radinsky - Data Scientists at Work

Database Reference

In-Depth Information

Red Cross help sent to Canberra.” I looked for these types of relations—

capital and country—and others using not only Wikipedia, but hundreds of

other data sets from a project called Linked Data. The biggest one of these

connected data sets was, of course, Wikipedia, which is just structured infor-

mation from Wikipedia.

With this causality graph I could now ask it anything I wanted. An interesting

example I used to give is what it taught me when I wanted to buy an iPad.

I asked the system, “How much does an iPad cost? Tell me what's going to be.”

The system then told me that prices were going to go up. I was curious why

it thought that, so I had it backtrack how it went through this causality graph.

It told me now prices were going to go up because of the tsunami in Japan.

I then asked how the price of an Apple product in the United States was going

to go up because of the tsunami in Japan. It then showed me the relation. The

chain was basically: a tsunami occurred in Japan, there were factories on the

shore, some of those factories make some of the substances needed for a

chip factory in China, which makes iPads for the United States. So the system

had calculated that if factories on the shore were affected, it could lead to a

shortage of materials. It had seen in the past that when you have a shortage

of something, prices go up. It was an interesting observation to be able to

deduce.

However, the problem with looking at causation is that sometimes trivial

things are generated. This was one of the things that surprised me, because

it could take several hops of causality to reach something trivial. For instance,

I would give it the following scenario, “Israeli professor killed after bombing,”

and ask it what would happen next. I would also ask people the same question

to be able to compare answers. The people I asked would say that it would

eventually lead to protests and other bad things happening. In comparison, the

system would say, “a funeral will be held.” And the funny thing is that the sys-

tem would actually generate an entire text of that, which is great because it's

true, but it is pretty much trivial. The reason for that is because we trained the

system on what people were going to view as causes and effects. This made

me realize I need to move to the next level.

The next step was then to look for important correlations. The way

I approached this was to look for storylines or patterns of history. A sto-

ryline would come from something like several news articles discussing

approximately the same entities for a news item. This could be something like

somebody gets shot, somebody gets arrested, there is a trial, someone gets

acquitted, and so on. So this would be a storyline. Once we had lots and lots

of storylines, I then looked for patterns in those storylines. When a pattern

emerged, I would do a semantic analysis of those news articles, extract differ-

ent entities, and find out what was going on between one to another, including

the abstractions I mentioned earlier—country and its capital and so forth, as

well as the relationship between entities that came from the causality graph.

Data Scientists at Work

Search WWH ::

Custom Search

Home