Bio-inspired Grammatical Inference - Foundations on Natural and Artificial Computation

Information Technology Reference

In-Depth Information

with their environment) and hypothesizes complete grammars instantaneously

(this assumption is unrealistic).

The Query learning model proposed by D. Angluin has also some controversial

aspects from a linguistic point of view; for example, the learner is able to ask

theteacherifhishypothesisiscorrect (such a query will never be produced in

a real situation; a child would never ask the adult if his grammar is the correct

one), and the learner learns exactly the target language (this is not realistic,

since everybody has imperfections in their linguistic competence).

In the PAC learning model proposed by L. Valiant, the examples provided to

the learner have the same distribution throughout the process; this requirement

is too strong for practical applications.

Therefore, none of these models perfectly account for natural language ac-

quisition. Research in GI has been focused on the mathematical aspects of the

formal models proposed, without exploiting their linguistic relevance. A longer

discussion about these models can be found in [5].

2.2 Language Learning Problem

The problem of language learning concerns both the acquisition of the syntax

(i.e., rules for generating and recognizing correct sentences in the language) and

the semantics (i.e., the underlying meaning of each sentence) of a target language

[26]. However, GI studies has been focused only on learning the syntax.

Semantics not only is one component of language learning, but also seems to

play an important role in the first stages of children's language acquisition (as

we will see in the next section). Therefore, it is also of great interest to study

this component. Unfortunately, all these considerations have not been taken into

account in GI studies; the learning problem has been reduced to syntax learning,

and all semantic information has been omitted from their works.

GI algorithms are based on the availability of different types of information:

positive examples, negative examples, the presence of a teacher able to answer

queries, etc. However, what kind of data is available to children? Ideally, to

better understand the process of natural language acquisition and to correctly

simulate it, we should provide to our algorithm the same kind of examples that

are available to children. However, some of the data used by GI algorithms are

controversial from a linguistic point of view. We will discuss some linguistic

studies that try to answer this question in the next section.

In order to make the problem of language learning well defined, it is also nec-

essary to choose an appropriate class of grammars. The classes of regular and

context-free grammars are often used in GI to model the target grammar. These

two classes constitutes the first two levels of the Chomsky hierarchy. Thus, the

following question arises: do they have enough expressive power to describe nat-

ural languages? From a linguistic point of view, it is of great interest to study

classes of grammars that are able to generate the most relevant constructions

that appear in natural languages. However, it seems not to be the case of regu-

lar and context-free grammars. We will discuss the limitations of the Chomsky

hierarchy in the next section.

Foundations on Natural and Artificial Computation

Search WWH ::

Custom Search

Home