Database Reference
In-Depth Information
prediction”—I'm willing to bet people are going to reinvent over the next few
years. I have this paper that I coauthored with Léon Bottou, Patrick Haffner,
and Yoshua Bengio that basically describes convolutional nets as well as the
full end-to-end system we built. The paper is called “Gradient-Based Learning
Applied to Document Recognition.” It's from 1998 and it was published in the
Proceedings of the IEEE . It's a very long paper and it was written a couple years
after we finished the system.
The check-recognition system we built started as a research prototype that
then got turned into a product. A company named NCR, which at the time
was a subsidiary of AT&T, deployed the product in their machines. It was first
deployed in 1996. And in 1996 AT&T split itself up and basically ended the
project because the research group stayed with AT&T while the development
group went to Lucent Technologies. The product group, NCR as a company,
was spun off, so the whole project was disbanded right after it became incred-
ibly successful. The only disappointing thing about the project was that we
never really received internal credit for the success except that I was made a
manager! It was depressing for me that at this great moment of success, all of a
sudden the whole company that made it possible decided to break itself up.
It took us a couple of years to write a long paper that described the entire
system. So the first half of it is basically: Here is what convolutional nets are all
about, here is how you implement them, and then here is everything else you
need to know about the technique. Then the second half of the paper is how
you integrate this with things like language models or language interpretation
models. For instance, when you're reading a piece of text, if it's English text,
you have grammar for English, so you want a system on top of it that extracts
the most likely interpretation that is part of the language. And what you'd
like to be able to do is to train the system to simultaneously do the recogni-
tion and the segmentation, as well as provide the right input for the language
model. We managed to figure out how to do this.
Since then the technique has been reinvented in different forms multiple times
for different contexts of natural language processing and for other things.
There are models called CRF—conditional random fields—as well as struc-
tured perceptron, and then later things such as structured SVMs, which are
very much in the same spirit except they're not deep. Our system was deep.
So the second half of that paper talks about how you do this. Sadly, it seems
like very few people ever read the second half!
I'm extremely proud of our work. It's probably the thing I'm most proud of.
The convolutional net, which is what I'm known for, has had a huge impact
in recent years. There was nothing conceptually complicated about it. What
was difficult to do at the time was to actually make it work, particularly given
the computational resources we had at the time and the software tools that
we had to build ourselves. I'm very proud of some of the early work on back
 
Search WWH ::




Custom Search