Observational methods (child development)

 

Introduction

Observational methods admit to a variety of meanings, but two stand out. According to the broader of the two meanings, they might include procedures by which informed observers produce narrative reports, such as those by Jean Piaget or Charles Darwin and other baby biographers when describing the development of their infants. Such reports have greatly enriched our understanding of child development, but require talent and wisdom on the part of the observer that is not easily reduced to a list of techniques and tools. In contrast, according to the narrower meaning, observational methods are often understood by students of child development to refer to procedures that result in quantification of the behavior observed. The requisite techniques and tools are relatively easy to describe and are the subject of this entry.

If data are understood as generally quantitative, then data collection means measurement, which is defined by procedures that, when applied to things or events, produce scores. This entry describes measurement procedures that permit investigators of child development to extract scores from observed behavior that can then be analyzed with conventional statistical techniques. Other entries in this first section of Part II focus on different data collection techniques such as parental and teacher rating scales, whereas the other two sections consider issues of research design and data analysis. The first three sections of Part II work together. Matters of design (second section) define the circumstances of data collection, and measurement produces the scores that then become grist for the data analytic mill (third section).

What makes observational methods different from other measurement approaches? In an attempt to address this question, I consider five topics in turn and explain their relevance for observational methods. These topics are coding schemes, coding and recording, representing, reliability, and reducing. Then, at the end of this entry, I will address two further questions: for what circumstances are these methods recommended? And what kinds of researchers have found them useful?

Coding schemes

Coding schemes, which are measuring instruments just like rulers and thermometers, are central to observational methods. They consist of sets of pre-defined behavioral categories representing the distinctions that an investigator finds conceptually meaningful and wishes to study further. One classic example is Parten’s (1932) coding scheme for preschool children’s play. She defined six categories (viz., unoccupied, onlooking, solitary, parallel, associative, and cooperative) and then asked coders to observe children for one minute each on many different days and to assign the most appropriate code to each minute.

Examples of other coding schemes can be found in Bakeman & Gottman (1997) and throughout the literature generally, but most share one thing in common: like Parten’s scheme, they consist of a single set of mutually exclusive and exhaustive codes (i.e., there is a code for each event, but in each instance only one applies) or of several such sets, with each set coding a different dimension of interest. In the simplest case, a set could consist of just two codes, presence or absence of the event. Thus, if observers were asked to note occurrences of five different behaviors, any of which could co-occur, this could be regarded as five sets with each set containing two codes, “yes” or “no.”

It is sometimes objected that coding schemes are too restrictive and that pre-defined codes may allow potentially interesting behavior to escape unremarked. Earlier, I referred to observing without the structure of a coding scheme as observation in a broad sense, and I assume that such qualitative, unfettered observation occurs while coding schemes are being developed and will influence the final coding schemes. Once defined, however, coding schemes have the merits of replicability and greater objectivity that they share with other quantitative methods. Even so, coders should remain open to the unexpected and make qualitative notes as circumstances suggest. Further refinement of the measuring instruments is always possible.

Coding and recording

Armed with coding schemes, and presented with samples of behavior, observers are expected to categorize (i.e., code) quickly and efficiently various aspects of the behavior passing before their eyes. One basic question concerns the coding unit: to what entity is a code assigned? Is it a neatly bounded time interval such as the single minute used by Parten? Or is it successive n-second intervals as is often encountered in the literature? Or is it an event of some sort? For example, observers might be asked to identify episodes of struggles over objects between preschoolers and then code various dimensions of those struggles. Alternatively, as often happens, they might be asked to segment the stream of behavior into sequences of events or states, coding the type of the event and its onset and offset times.

A second basic question concerns the scale of measurement. Most coding schemes require observers to make categorical (or nominal) judgments, yet some coding schemes ask them to carry out ordinal judgments (e.g., rating the emotional tone of each n-second interval on a 1 to 7 scale). Categorical judgments are also called qualitative and should not be confused with qualitative reports: the counts and sequences that result from categorical measurement can be subjected to quantitative analysis in a way that qualitative narrative reports cannot, unless the qualitative reports are themselves coded.

Some observations can be automated (e.g., the position of an animal in an enclosure or a person’s physiological responses). In contrast, coding schemes used in child development, especially when social behavior is studied, often require human judgment and would be difficult if not impossible to automate. Human observers are required and need to record their judgments in some way. It is possible to observe behavior live in real time, recording the judgments made simply with pencil and paper, some sort of hand-held electronic device, or a specially programmed lap-top computer. More likely, the behavior of interest will be video recorded for later coding, which permits multiple viewings, in both real time and slow motion, and reflection (literally, re-view) in a way live observation does not. With today’s video systems, usually time will be recorded as a matter of course, but it has not always been so.

Especially in older literature, observers used interval recording, which is often called zero-one or partial-interval or simply time sampling (Altmann, 1974). Typically, rows on a paper recording form often represented quite short successive intervals (e.g., 15 seconds) and columns represented particular behaviors; observers then noted with a tick mark the behaviors that occurred within each interval. The intent of the method was to provide approximate estimates of both frequency and duration of behaviors in an era before readily available recording devices automatically preserved time. It was a compromise, reflecting the technology of the time, and no longer seems recommended.

Representing

Occasionally, investigators may refer to video recordings as data, but making a video recording is not the same as recording data. Thus, the question arises: how should coding of video recordings be recorded? More generally, how should any data be represented (literally, re-presented) for subsequent computer processing? Since a low-tech approach to coding relies only on pencil and paper and the naked eye, and alternatively a high-tech approach connects computers and video recordings, then a relatively mid-tech approach to coding video material might use video recording but rely on a visual time code displayed on the monitor (instead of an internal, electronically recoded one). This would allow observers to record not just behavioral codes, but also the time they occurred. Almost always, data will ultimately be processed by computer so observers viewing video could use pencil and paper for their initial records, and then enter the data in computer files later. Alternatively, they could key their observations directly into a computer as they worked, whichever they find easier. Such a system retains all the advantages that accrue to coding previously video-recorded material, and is attractive when budgets are constrained.

When feasible, a more high-tech approach has advantages and a number of systems are available. Such systems combine video recordings and computers in ways that serve to automate coding. Perhaps the best known is The Observer (Noldus, Trienes, Henriksen, Jansen, & Jansen, 2000). In general, computer-based coding systems permit researchers to define the codes they use and their attributes. Coders can then view previously video-recorded information in real time or slow motion as they decide how the material should be coded. Subsequently, computer programs organize codes and their associated times into computer files. Such systems tend to the clerical tasks, freeing coders to focus on their primary task, which is making decisions as to how behavior should be coded.

Table 1. An agreement matrix.


Codes

 


 

ObserverB

 

 

Unoccupied

Onlooking

Solitary

Parallel

Associative

Cooperative

Observer A Unoccupied

7

2

0

0

0

0

Onlooking

1

13

1

3

0

0

Solitary

3

0

24

4

1

0

Parallel

0

0

1

27

3

0

Associative

0

0

0

2

9

3

Cooperative

0

0

0

0

0

6

Rows represent Observer A and columns Observer B. In this case, 110 samples were coded. Percentage agreement was 78 percent (i.e., 86 of the 110 tallies were on the upper-left to lower-right diagonal, representing exact agreement). The pattern of disagreements (i.e., off-diagonal tallies) suggests that Observer B sees more organized behavior than Observer A (e.g., 4 samples that Observer A coded Solitary, Observer B coded Parallel; 3 samples that Observer A coded Parallel, Observer B coded Associative; and another 3 samples that Observer A coded Associative, Observer B coded Cooperative; the corresponding Observer B to A errors occur only 1, 2, and 0 times). Thus, even though the kappa is a respectable .72, recalibration of the observers is suggested.

No matter how coding judgments are captured initially, they can be reformatted using Sequential Data Interchange Standard (SDIS) conventions for sequential data; such data files can then be analyzed with the Generalized Sequential Querier (GSEQ), a program for sequential observational data that has considerable flexibility (for both SDIS and GSEQ, see Bakeman & Quera, 1995).

Reliability

The accuracy of any measuring device needs to be established before weight can be given to the data collected with it. For the sort of observational systems described here, the instrument consists of trained human observers applying a coding scheme or schemes to streams of behavior, often video-recorded. Thus, the careful training of observers and establishing their reliability is an important part of the enterprise. As previously noted, usually observers are asked to make categorical distinctions. For this reason, the most common statistic used to establish inter-observer reliability is Cohen’s kappa, a coefficient of agreement for categorical scales (Bakeman & Gottman, 1997). Cohen’s kappa corrects for chance agreement and thus is much preferred to the percentage agreement statistics sometimes used, especially in older literature. Moreover, the agreement matrix required for its computation is useful when training observers due to the graphic way it portrays specific sources of disagreement (see Table 1).

Reducing and analyzing

Observational methods often result in voluminous data, thus data reduction is often a necessary prelude to analysis. A useful strategy is to collect slightly more detailed data than one intends to examine. In such cases, initial data reduction will consist of lumping some codes. Other data reduction may involve computation of conceptually targeted indices (e.g., an index of the extent to which mothers are responsive to the gaze of their infants), which then serve as scores for multiple regression or other kinds of statistical analyses. Several examples of this useful and productive strategy for observational data are given in Bakeman & Gottman (1997).

Conclusions

Historically, observational methods have proven useful when process aspects of behavior are emphasized more than behavioral outcomes, or for studying any behavior that unfolds over time. They have been widely used for studying non-verbal organisms (e.g., infants), non-verbal behavior generally, and all kinds of social interaction. Mother-infant interaction and emotion regulation are two areas in which observational methods have been widely used, but others include school and classroom behavior. Observational methods seem to have a kind of naturalness not always shared with other measurement strategies. Observers are not always passive or hidden and situations are often contrived, and yet the behavior captured by these methods seems freer to unfold, reflecting a target’s volition more than seems the case with, for example, self-report questionnaires. Self-reflection is not captured, but aspects of behavior outside immediate articulated awareness often are.

With recent advances in technology, observational methods have become dramatically easier. Handheld devices can capture digital images and sound, computers permit playback and coding while automating clerical functions, and computer programs permit flexible data reduction and analysis. In the past, potential users of observational methods may have been dissuaded by technical obstacles. Whether or not future investigators select observational methods will come to depend primarily on whether these methods are appropriate for the behavior under investigation.

Next post:

Previous post: