Information Technology Reference
In-Depth Information
4.2 Linguistic Examples
In the last years, many authors have stressed the close relationship between the ge-
netic tree of the human species and the phylogenetic tree of human languages [2].
It seems that language evolution and the evolution of species are very related, and
this allows us to try to approach both processes with the same tools. However, there
is an important difference between them. Whereas the genetic code of two different
species does not cross once separate, the influence of a language on others because
of geographical, political or social reasons is constant in history. In spite of this,
we think that our constraint-based method can help the researchers to establish a
hierarchical ordering of features in languages, starting from the data we know. In
the future, an improved version of the method, dealing with an enough represen-
tative number of features could help to understand the processes of derivation and
contact of languages that are not well-known so far.
Classification of natural languages has been performed by considering as ba-
sis different underlying principles. In general, it is possible to distinguish the
following three directions: a) Genetic Classification that pays attention to the
historical evolution of languages; b) Typological Classification . In this type of lin-
guistic research the classification is obtain by analysing the internal structure of
languages; c) Areal Classification . This type of classification considers geograph-
ical closeness and contacts between languages. The different classifications do
not match each other. However, the correlation between them may be of impor-
tance for linguistic research. We claim that our methodology can contribute to
the three different types of classifications. In what follows, we illustrate this idea
with several examples. For our linguistic 'toy example' we have chosen a limited
set of 9 languages: Basque (BAS), German (GER), English (ENG), Portuguese
(POR), Spanish (SPA), Catalan (CAT), French (FRE), Italian (ITA) and Ro-
manian (ROM). We have selected, in an almost-aleatory way, twenty different
features, gathered in four groups:
1. Phonetics: a) nasal consonants, b) voiced and voiceless fricative dentals; c)
voiced and voiceless fricative post-alveolars; d)double consonants; e) distinc-
tion voiced and voiceless fricative labiodentals; f) number of vowels.
2. Morphology: a) number of morphological genders; b) phrase agreement; c)
Verbal inflection; d) case; e) plural formation.
3. Syntax: a) partitive Pronoun; b) precedence Noun Adjective; c) SVO order;
d) ergativity; e) obligatory Subject; f) to be as auxiliar for the past tenses;
g) auxiliary word for negation.
4. Lexicon: a) days of the week; b) numbers from 10 to 20.
This distribution is well balanced between Phonetics, Morphology and Syntax,
and gives a small prominence to Lexicon, the most fluctuant part of language.
Some of these features have been assigned a 0/1 distribution, and some others,
a distribution with a greater range, later normalized.
Intuitively, it is easy to see that, giving to every feature the same weight, it
is going to be a hard task to complete a consistent tree. However, the weights
can help the system to build a 'correct' tree. As an example, ergativity is a very
 
Search WWH ::




Custom Search