Robotics Reference
In-Depth Information
Via Spanish
Original text: out of sight, out of mind
Retranslated text: outside Vista, the mind
Original text: the spirit is willing but the flesh is weak
Retranslated text: the alcohol is arranged but the meat is
weak
It is not difficult to understand why progress in Machine Translation has
been as slow as that in natural language understanding. If a program
cannot understand what a sentence means in one language, how can it
correctly translate that sentence into another language? Only in the case
of single sentences can we envisage the day when an enormous corpus
of sentences with their translations will suffice, without any understand-
ing being necessary, just like looking up a word in a dictionary. Then,
NLP researchers will not be discussing word-for-word translations but
sentence-for-sentence ones. There will be rules for determining what
alternative words can be slotted into translations, in order that the trans-
lation of “My cat likes milk” can also be used, with one or more substitu-
tions, to translate “My dog likes milk”. In some languages it will be good
enough to replace the translation of “cat” for that of “dog”, while in other
languages there may be rules that require a little more, rules relating to
the gender of nouns, the conjugation of verbs, or some other aspect of
the target language.
AQuestionofScale
On the first page of this chapter I suggested that the difficulties faced by
NLP researchers in developing good conversational software are partly a
matter of scale. Here is why I believe that to be the case.
Yorick Wilks' pronouncement, “AI is a little software and a lot of
data”, is becoming increasingly prophetic. As impressive results are
achieved in solving problems in AI, we more and more often learn that
a result has come from making good use of a big database. Expert sys-
tems, for example, tend to rely on large databases of knowledge, often
expressed as rules.
I believe that great successes in NLP will be achieved when statisti-
cal approaches are applied to massive corpora, far bigger than the ones
hitherto in use. Statistical approaches based on very large corpora will en-
able researchers to develop generalized models of linguistic phenomena
based on actual examples of these phenomena provided by the corpora
Search WWH ::




Custom Search