Almost every human child succeeds in learning language. As a result, people often tend to take the process of language learning for granted. To many, language seems like a basic instinct, as simple as breathing or blinking. But, in fact, language is the most complex ability that a human being will ever master. The fact that all people succeed in learning to use language, whereas not all people learn to swim or do calculus, demonstrates how fully language conforms to our human nature. In a very real sense, language is the complete expression of what it means to be human.
Linguists in the Chomskyan tradition (Pinker, 1994) tend to think of language as having a universal core from which individual languages select out a particular configuration of optional features known technically as ‘parameters’ (Chomsky, 1982). As a result, they see language as an instinct driven by specifically human evolutionary adaptations. In their view, language resides in a unique mental organ that has been given as a Special Gift to the human species. This mental organ contains rules, constraints, and other structures that can be specified by linguistic analysis. Without guidance from this universal core, the child would be unable to piece together the intricate details of language structure.
Many psychologists (Fletcher &MacWhinney, 1995) and linguists who reject the Chomskyan approach view language learning from a very different perspective. To the psychologist, language development is a window on the operation of the human mind. The patterns of language emerge not from a unique instinct, but from the operation of general processes of evolution, cognition, social processes, and facts about the human body. For researchers who accept this emergentist approach, the goal of language acquisition studies is to understand how regularities in linguistic form emerge from the operation of low-level physical, neural, and social processes. Before considering the current state of the dialogue between the view of language as a hard-wired instinct and the view of language as an emergent process, let us review a few basic facts about the developmental course of language acquisition and some of the methods used to study it.
Early auditory development
William James (1890) described the world of the newborn as a “blooming, buzzing confusion.” However, we now know that, at the auditory level at least, the newborn’s world is remarkably well structured. The cochlea and auditory nerve provide extensive preprocessing of signals for frequency and intensity. In the 1970s and 1980s, researchers discovered that human infants were specifically adapted at birth to perceive contrasts such as that between /p/ and /b/, as in pit and bit. Subsequent research showed that even chinchillas are capable of making this distinction. This suggests that much of the basic structure of the infant’s auditory world can be attributed to fundamental processes in the mammalian ear and cochlear nucleus. Beyond this basic level of auditory processing, it appears that infants have a remarkable capacity to record and store sequences of auditory events. It is as if the infant has something akin to a tape recorder in the auditory cortex that records input sounds, replays them, and accustoms the ear to their patterns, well before learning the actual meanings of these words (Fig. 1).
One clever method for studying early audition relies on the fact that babies tend to habituate to repeated stimuli from the same perceptual class. If the perceptual class of the stimulus suddenly changes, the baby will brighten up and turn to look at the new stimulus. If the experimenter constructs a set of words which share a certain property and then shifts to words that have a different property, the infant may demonstrate awareness of the distinction through preferential looking (Fig. 2). For example, a baby may slowly habituate to a long string of syllables that have the form ABA, such as badaba-nopano-rinori-punapu. If the string then shifts to an ABB structure, as in badada-rinono-satoto-punana, the infant will perk up and show increased attention to the new string.
Figure 1. The human peripheral auditory apparatus.
Infants also demonstrate preferences for the language that resembles the speech of their mothers. Thus, a French infant will prefer to listen to French, whereas a Polish infant will prefer to listen to Polish. In addition, they demonstrate a preference for their own mother’s voice, as opposed to that of other women. Together, these abilities and preferences suggest that, during the first eight months, the child is remarkably attentive to language. Although babies are not yet learning words, they are acquiring the basic auditory and intonational patterns of their native language. As they sharpen their ability to hear the contrasts of their native language, they begin to lose the ability to hear contrasts not represented in that language. If the child is growing up in a bilingual world, full perceptual flexibility is maintained. However, when growing up as a monolingual, flexibility in processing is gradually traded off for quickness and automaticity.
During the first three months, babies produce cries that express hunger, distress, and sometimes pain. By 3 months, at the time of the first social smile, they begin to make the delightful little sounds that we call ‘cooing’ (Fig. 3). By 6 months, the infant is producing structured vocalizations, including a larger diversity of individual vowels and consonants, mostly structured into the shape of the consonant-vowel (CV) syllables like ta or pe. The basic framework of early babbling is built on top of patterns of noisy lip-smacking that are present in many primates. These CV vocal gestures include some form of vocal closure followed by a release with vocalic resonance.
Until the sixth month, deaf infants babble much like hearing children. However, well before 9 months, deaf infants lose their interest in babbling. This suggests that their earlier babbling is sustained largely through proprioceptive and somaesthetic feedback, as the baby explores the various ways in which she can play with her mouth. After 6 months, babbling relies increasingly on auditory feedback. During this period, the infant tries to produce specific sounds to match up with specific auditory impressions. It is at this point that the deaf child no longer finds babbling entertaining, since it is not linked to auditory feedback. These facts suggest that, from the infant’s point of view, babbling is essentially a process of exploring the coordinated use of the mouth, lungs, and larynx.
In the heyday of behaviorism, researchers viewed the development of babbling in terms of reinforcement theory. They thought that the reinforcing qualities of language would lead a Chinese baby to babble the sounds of Chinese, whereas a Quechua baby would babble the sounds of Quechua. This was the theory of’babbling drift.’ However, closer observation has indicated that no drift toward the native language occurs until well after 9 months. By 12 months, there is some slight drift in the direction of the native language, as the infant begins to acquire the first words. Proponents of universal phonology have sometimes suggested that all children engage in babbling all the sounds of all the world’s language. Here, again, the claim seems to be overstated. Although it is certainly true that some English-learning infants will produce Bantu clicks and Quechua implosives, not all children produce all of these sounds.
Figure 2. The preferential looking task in which changes in infant gaze signal discrimination of differences in words.
Figure 3. (a) Spectogram of a 3-month-old boy cooing. (b) Mother imitating her child after listening many times to a tape-loop on which the baby noises are recorded.
The first words
The emergence of the first word is based on three earlier developments. The first is the infant’s growing ability to record the sounds of words. The second is the development of an ability to control vocal productions that occurs in the late stages of babbling. The third is the general growth of the symbolic function, as represented in play, imitation, and object manipulation.
Piaget (1954) characterized the infant’s cognitive development in terms of the growth of representation or the ‘object concept.’ In the first six months of life, the child is unable to think about objects that are not physically present. However, a 12-month-old will see a dog’s tail sticking out from behind a chair and realize that the rest of the dog is hiding behind the chair. This understanding of how parts relate to wholes supports the child’s first major use of the symbolic function. When playing with toys, the 12-month-old will begin to produce sounds such as vroom or bam-bam that represent properties of these toys and actions. Often these phonologically consistent forms appear before the first real words. Because they have no clear conventional status, parents may tend to ignore these first symbolic attempts as nothing more than spurious productions or babbling.
Even before producing the first conventional word, the 12-month-old has already acquired an ability to comprehend as many as ten conventional forms. The infant learns these forms through frequent associations between actions, objects, and words. Parents often realize that the pre-linguistic infant is beginning to understand what they say. However, they are hard-pressed to demonstrate this ability convincingly.
Researchers deal with this problem by bringing infants into the laboratory, placing them into comfortable highchairs, and asking them to look at pictures, using the technique of visually reinforced preferential looking. A word such as dog is broadcast across loudspeakers. Pictures of two objects are then displayed. In this case, a dog may be on the screen to the right of the baby and a car may be on the screen to the left. If the child looks at the picture that matches the word, a toy bunny pops up and does an amusing drum roll. This convinces the baby that they have chosen correctly, and they then do the best they can to look at the correct picture on each trial. Some infants get fussy after only a few trials, but others last for ten trials or more at one sitting and provide reliable evidence that they have begun to understand a few basic words. Many children show this level of understanding by the tenth month – two or three months before the child has produced a recognizable first word.
Producing the first word is a bit like stepping out on stage. In babbling, the only constraints infants faced were ones arising from their own playfulness and interest. However, when faced with the task of producing standardized word forms, the child’s articulation must be accurate enough to fit within conventional limits. In practice, the forms of early words often deviate radically from the adult standard. Children tend to drop unstressed syllables, producing hippopotamus as poma. They repeat consonants, producing water as wawa. And they simplify and reduce consonant clusters, producing tree as pee. These phonological processes echo similar processes found in the historical development and dialectal variation of adult language. What is different in child language is the fact that so many simplifications occur at once, making so many words difficult to recognize.
As the child’s stock of words grows, it becomes harder to keep words apart from each other. To solve this problem, children must strike a delicate balance between two opposing strategies. On the one hand, children may try to be conservative in their first uses of words. For example, a child may use the word dog to refer only to the family dog and not to any other dog. Or a child may use the word car to refer only to cars parked outside a certain balcony in the house and not cars in any other context. This tendency toward undergeneralization can only be detected if one takes careful note of the contexts in which a child avoids using a word. The flip side of this coin is the strategy of overgeneralization. It is extremely easy to detect overgeneralizations. If the child calls a tiger a kitty, this is clear evidence for overgeneralization.
At first, both undergeneralization and overgenerali-zation are applied in a relatively uncontrolled fashion. Early undergeneralizations are quickly corrected. For example, parents will soon teach the child that the word dog refers not to just the family dog, but to all the dogs that live on the block, as well as dogs in pictures. The child’s first attempts at generalization are also often wildly overproductive. For example, a child may use the word duck first to refer to the duck, then to the picture of an eagle on the back of a coin, then to a lake where she once saw ducks, and finally to other bodies of water. These ‘pleonastic’ extensions of forms across situations are fairly rare, but they provide interesting commentary regarding the thinking of the toddler when they do occur.
Scholars from Plato to Quine have considered the task of figuring out word meaning to be a core intellectual challenge. Quine (1960) illustrated the problem by imagining a scenario in which a hunter is out on safari with a native guide. Suddenly, the guide shouts “Gavagai!” and the hunter, who does not know the native language, quickly has to infer the meaning of the word. Does it mean “Shoot now!” or “There’s a rhino” or perhaps even “It got away”? Without some additional cues regarding the likely meaning of the word, how can the hunter figure this out? Fortunately, the toddler has more cues to rely on than the hunter. Foremost among these cues is the parent’s use of joint attention and shared eye gaze to establish common reference for objects and actions. If the father says “hippo” while holding a hippopotamus in his hand, the child can use the manual, visual, verbal, and proxemic cues to infer that the word hippo refers to the hippopotamus. A similar strategy works for the learning of the names of easily produced actions such as falling, running, or eating. It also works for social activities such as bath, or bye-bye. The normal child understands the important role of contact through the eyes well before learning the first words. At 3 months, children maintain constant shared eye gaze with their parents. In normal children, this contact persists and deepens over time.
Blind children use touch and other methods to establish a similar domain of shared attention. For many autistic children, contact is less stable and automatic. As a result, autistic children may be delayed in word learning and the general development of communication.
Shared reference is not the only cue toddlers use to pick out the reference of words. They also use the grammatical form of utterances to derive the meanings of new words. For example, if the toddler hears the sentence Here is a zav, it is clear that zav is a common noun. However, in the sentence Here is Zav, then Zav must be either a proper noun or perhaps the name of a mass quantity, like sand. If a toddler hears I want some zav, then it is clear that zav is a quantity and not a proper or common noun. Cues of this type can give a child a rough idea of the meaning of a new word (L. B. Smith, 1999). Other sentential frames can give an even more precise meaning. If the child hears This is not green, it is chartreuse, then it is clear that chartreuse is a color. If the child hears Please don’t cover it, just sprinkle it lightly, then the child knows that sprinkle is a verb of the same general class as cover. The use of cues of this type leads to a fast, but shallow, mapping of new words to new meanings.
Throughout the second year, the child struggles with perfecting the sounds and meanings of the first words. For several months, the child produces only isolated single words. However, the real power of language lies in the process of word combination and the child soon realizes the importance of combining predicates such as want, more, or go with arguments such as cookie or Mommy. The association of predicates to arguments is the first step in syntactic development. As in the other areas of language development, these first steps are taken in a very gradual fashion. Before producing a smooth combination of two words such as my horsie, children will often string together a series of single-word utterances that appear to be searching out some syntactic form. For example, a child might say my, that, that, horsie with pauses between each word. Later, the pauses will be gone and the child will say that horsie, my horsie. This tentative combination of words involves groping on both intonational and semantic levels. On the one hand, the child has to figure out how to join words together smoothly in production. On the other hand, the child also has to figure out which words can meaningfully be combined with which others.
As was the case in the learning of single words, the production of the first word combinations is guided by earlier developments in comprehension. Here, again, researchers have used the preferential looking paradigm to measure early sentence comprehension. In a typical form of this experiment, there is a TV monitor to the child’s right with a movie of Big Bird tickling Cookie Monster. To the child’s left, there is a TV monitor with a movie of Cookie Monster tickling Big Bird. The experimenter produces the sentence Big Bird is tickling Cookie Monster. If the child looks at the matching TV monitor, a correct look is scored. Using this technique, researchers have found that 17-month-olds already have a good idea about the correct word order for English sentences. This is about five or six months before they begin to use word order systematically in production.
The grammar of the child’s first combinations is extremely basic. The child learns that each predicate should appear in a constant position vis a vis the arguments it requires. For example, in English, the word more appears before the noun it modifies, and the verb run appears after the subject with which it combines. Slot-filler relations can control this basic type of grammatical combination. Each predicate specifies a slot for the argument. For example, more has a slot for a following noun. When a noun, such as milk, is selected to appear with more, that noun becomes a filler for the slot opened up by the word more. The result is the combination more milk. Later, the child can treat this whole unit as an argument to the verb want and the result is want more milk. Finally, the child can express the second argument of the verb want and the result is I want more milk. Thus, the child gradually builds up longer sentences and a more complex grammar. This level of simple combinatorial grammar is based on individual words as the controlling structures. Such word-based control of grammar is important even in adults. In languages with strong morphological marking systems, word-based patterns specify the attachment of affixes, rather than just the linear position of words. In fact, most languages of the world make far more use of morphological marking than does English. In this regard, English is a rather exotic language.
Filling in the missing glue
The child’s first sentences are almost all incomplete and ungrammatical. Instead of saying, This is Mommy’s chair, the child says only Mommy chair with the possessive suffix, pronoun, and copula all deleted. Just as the first words are full of phonological deletions and simplifications, the first sentences include only the most important words, without any of the relational glue. In some cases, children have simply not yet learned the missing words and devices. In other cases, they may know the ‘glue words’ but find it difficult to coordinate the production of so many words in the correct order. Because so much relational structure is missing, early utterances may be highly ambiguous. For example, it is not clear whether the phrase Mommy chair means This is Mommy’s chair or Mommy is sitting in the chair, although the choice between these interpretations may be clear in context.
Children’s learning of grammatical markings is driven by several factors. To begin with, children learn that certain markings are never omitted. For example, the progressive verb suffix -ing is one of the first suffixes learned by the child. This suffix is never omissible and children come to realize this. In addition, children tend to pick up markings that are highly regular and analytic. For example, the suffix -s is a reliable, consistent marker of plurality in English. However, if a form is highly frequent, children will learn it even if it is irregular and non-analytic. For example, children learn past tense forms such as went, came, or fell early on because of their high frequency. At the same time, they are also learning somewhat less frequent regular forms such as wanted and dropped. As the child learns more and more regular forms, the productivity of the regular past tense -ed increases and we find errors such as goed and failed. Productivity for grammatical markings can be demonstrated in the laboratory by teaching children names for new objects. For example, we can show a child a picture of a funny-looking creature and call it a ‘wug’ (Fig. 4). If we then show the child another one of these creatures and ask “what are these?” the child will produce the productive form wugs (MacWhinney, 1978
Figure 4. Stimuli designed to test children’s ability to inflect novel words productively in the ‘wug’ study).
Children aged 3 also demonstrate some limited productive use of syntactic patterns for new verbs. However, children tend to be conservative and unsure about how to use verbs productively until about age 5. Laboratory experiments with strange new toys and new words tend to encourage a conservative approach. As they get older and braver, children start to show productive use of constructions such as the double object, the passive, or the causative. For example, an experimenter can introduce a new verb like griff in the frame “Tim griffed the ball to Frank” and the 5-year-old will productively generalize to “Tim griffed Frank the ball.”
The control of productivity is based on two complementary sets of cues: semantics and co-occurrence.
When hearing a wug, the child correctly infers that wug is a count noun. Given a picture of a cute little animal, the child also infers that wug is a common, count, name for an animate creature. These semantic features allow the child to generalize the use of the plural suffix to produce the form wugs. At the same time, this extension illustrates the application of co-occurrence learning. The child learns that words that take the indefinite article (a dog, a wug) also form plurals (dogs, wugs). On the other hand, words that take the quantifier some (some bread) do not form plurals. In this way, children use both semantic and co-occurrence information to build up knowledge about the parts of speech. This knowledge can then be fed into existing syntactic generalizations to produce new combinations and new forms of newly learned words. The bulk of grammatical acquisition relies on this process.
Special Gift or emergence?
This overview has tended to view language acquisition as a developmental process, rich with opportunities for learning. The control of vocalization is seen as emerging from practice with the vocal apparatus. The process of trimming the meanings of the first words is viewed as emerging from interactions between parents and children. The learning of the patterns governing word combinations is viewed as emerging from operations on individual lexical items that slowly build up syntactic groups. How can we reconcile an emergentist view of this type with the Special Gift vision favored by Chomskyan linguists? Part of the solution lies in understanding the scope of the two accounts. The emergentist account tends to focus on the moment-to-moment processes of learning, whereas the Special Gift account focuses more on the general issue of whether language learning could occur without at least some genetic guidance.
Evidence for the Special Gift comes from the study of children who have been cut off from communication by cruel parents, ancient Pharoahs, or accidents of nature. The Special Gift position holds that, if the special gift for language is not exercised by some early age, perhaps 6 or 7, it will be lost forever. However, none of the isolation experiments that have been conducted can be viewed as good evidence for this claim. In many cases, the children are isolated because they are brain-injured. In other cases, the isolation itself produces brain injury. In a few cases, children as old as 6-8 years of age have successfully acquired language even after isolation. Thus, the most we can say from these experiments is that it is unlikely that the Special Gift expires before age 8. A better form of evidence of the importance of the Special Gift comes from the manual language produced by hearing children of deaf parents. These children piece together a crude form of communication with certain language-like properties, without guidance from exposure to any standard language.
Figure 5. A transcript in CHAT format linked to a quicktime movie playable over the web.
A second form of evidence in favor of the notion of a Special Gift comes from the fact that children are able to learn some grammatical structures without apparent guidance from the input. The argumentation involved here is sometimes rather subtle. For example, Chomsky notes that children would never produce “Is the boy who next in line is tall?” as a question deriving from the sentence “The boy who is next in line is tall.” Instead, they will inevitably produce the question as, “Is the boy who is next in line tall?” The fact that children always know which of the forms of the verb “is” to move to the front of the sentence, even without ever having heard such a sentence from their parents, indicates to Chomsky that language must be a Special Gift. Although the details of Chomsky’s argument are controversial, the basic insight here seems solid. There are some aspects of language that seem so fundamental that we hardly need to learn them. One of these is the fact that word combinations join together items that are meaningfully related. It is likely that evolution has provided genetic support for a few core linguistic abilities, including the linkage of sound to meaning and the ordering of words into relational structures.
Language emergence and time scales
A more comprehensive view treats this genetic determination of language structure as a type of emergent process operating on a particular time scale. In general, we can view developmental processes as emerging on five separate time scales:
1. Evolutionary emergence. The slowest-moving emergent structures are those that are encoded in the genes. Emergentist accounts of evolutionary processes emphasize continuity, and the ways in which evolution has reused older forms for new functions. The study of the last three million years of hominid evolution provides good evidence for the emergent and gradual nature of this process.
2. Epigenetic emergence. Translation of the DNA in the embryo triggers a further set of processes from which the initial shape of the organism emerges. The shape of neural development and the structuring of the infant brain emerges from these dynamical interations.
3. Developmental emergence. Piaget’s genetic psychology (Piaget, 1954) was the first fully articulated emergentist view of development. Current emergentist accounts of human development use mechanisms derived from connectionism, embodiment, and dynamical systems theory to explain the complexities of developmental emergence.
4. On-line emergence. The briefest time frame for the study of emergent processes is that of online language processing. Emergentist accounts are now showing how language structure emerges from the pressures and loads imposed by real-time on-line processing. These pressures involve social processes, memory mechanisms, attentional focusing, and motor control of the vocal tract.
5. Diachronic emergence. We can also use emergentist thinking to understand the changes that languages have undergone across the centuries. These changes emerge from a further complex interaction of the previous three levels of emergence (epigenetic, developmental, and online).
The major challenge now facing the theory of language development is to work out how language structure emerges across each of these diverse time frames. In the search for emergentist explanations, developmentalists are making use of new models and new technological tools. Advances in computing and robotics will soon allow us to build a cybernetic ‘baby that can use visual and auditory input to build up a human-like lexicon. By moving through its environment, this robot will develop a spatial and body map much like that of a human infant. Another major advance will rely on the linkage of videotape data to transcripts of interactions of real children with their parents and peers. Using web-based systems like CHILDES (Fig. 5) and TalkBank, researchers will be able share data that will help us understand how social mechanisms support the development of language and communication.