Yann LeCun - Data Scientists at Work

Database Reference

In-Depth Information

prediction”—I'm willing to bet people are going to reinvent over the next few

years. I have this paper that I coauthored with Léon Bottou, Patrick Haffner,

and Yoshua Bengio that basically describes convolutional nets as well as the

full end-to-end system we built. The paper is called “Gradient-Based Learning

Applied to Document Recognition.” It's from 1998 and it was published in the

Proceedings of the IEEE . It's a very long paper and it was written a couple years

after we finished the system.

The check-recognition system we built started as a research prototype that

then got turned into a product. A company named NCR, which at the time

was a subsidiary of AT&T, deployed the product in their machines. It was first

deployed in 1996. And in 1996 AT&T split itself up and basically ended the

project because the research group stayed with AT&T while the development

group went to Lucent Technologies. The product group, NCR as a company,

was spun off, so the whole project was disbanded right after it became incred-

ibly successful. The only disappointing thing about the project was that we

never really received internal credit for the success except that I was made a

manager! It was depressing for me that at this great moment of success, all of a

sudden the whole company that made it possible decided to break itself up.

It took us a couple of years to write a long paper that described the entire

system. So the first half of it is basically: Here is what convolutional nets are all

about, here is how you implement them, and then here is everything else you

need to know about the technique. Then the second half of the paper is how

you integrate this with things like language models or language interpretation

models. For instance, when you're reading a piece of text, if it's English text,

you have grammar for English, so you want a system on top of it that extracts

the most likely interpretation that is part of the language. And what you'd

like to be able to do is to train the system to simultaneously do the recogni-

tion and the segmentation, as well as provide the right input for the language

model. We managed to figure out how to do this.

Since then the technique has been reinvented in different forms multiple times

for different contexts of natural language processing and for other things.

There are models called CRF—conditional random fields—as well as struc-

tured perceptron, and then later things such as structured SVMs, which are

very much in the same spirit except they're not deep. Our system was deep.

So the second half of that paper talks about how you do this. Sadly, it seems

like very few people ever read the second half!

I'm extremely proud of our work. It's probably the thing I'm most proud of.

The convolutional net, which is what I'm known for, has had a huge impact

in recent years. There was nothing conceptually complicated about it. What

was difficult to do at the time was to actually make it work, particularly given

the computational resources we had at the time and the software tools that

we had to build ourselves. I'm very proud of some of the early work on back

Search WWH ::

Custom Search

Home