Information Technology Reference
In-Depth Information
Voices in favour of experimentalism as a way of research about software de-
velopment have recently grown stronger. DeMarco [3] claims that “The actual
software construction isn't necessarily experimental, but its conception is. And
this is where our focus ought to be. It's where our focus always ought to have
been”. Meyer [4, 5] has also joined the line of researchers to point to the impor-
tance of experimentation in SE.
A key component of experimentation is replication. To consolidate a body of
knowledge built upon experimental results, they have to be extensively verified.
This verification is carried out by replicating an experiment to check if its results
can be reproducible. If the same results are reproduced in different replications,
we can infer that such results are regularities existing in the piece of reality under
study. Experimenters acquainted with such regularities can find out mechanisms
regulating the observed results or, at least, predict their behaviour.
Most of the events observed through experiments in SE nowadays are isolated.
In other words, most SE experiments results have not been reproduced. So there
is no way to distinguish the following three situations: the results were produced
by chance (the event occurred accidentally); the results are artifactual (the event
only occurs in the experiment not in the reality under study), or the results really
do conform to a regularity of the piece of reality being examined.
A replication has some elements in common with its baseline experiment.
When we start to examine a phenomenon experimentally, most aspects are un-
known. Even the tiniest change in a replication can lead to inexplicable differ-
ences in the results. In immature experimental disciplines, which experimental
conditions should be controlled can be found out by starting off with replications
closely following the baseline experiment [6]. In the case of well-known phenom-
ena, the experimental conditions that influence the results can be controlled,
and artifactual results are identified by running less similar replications. For ex-
ample, using different experimental protocols to verify the results correspond to
experiment-independent events.
The immaturity of ESE has been an obstacle to replication. As the mech-
anisms regulating software development and the key experimental conditions
for its investigation are yet unknown, even the slightest change in the replica-
tion leads to inexplicable differences in the results. However, context differences
oblige experimenters to adapt the experiment. These changes can lead to sizeable
differences in the replication results that prevent the outcomes of the baseline
experiment from being corroborated. In several attempts at combining the re-
sults of ESE replications, Hayes [7], Miller [8-10], Hannay et al. [11], Jørgensen
[12], Pickard et al. [13], Shull et al. [14] and Juristo et al. [15] reported that the
differences between results were so large that they found it impossible to draw
any consequences from the results comparison.
ESE stereotype of replication is an experiment that is repeated independently
by other researchers at different sites to the baseline experiment. But some of the
replications in ESE do not conform to this stereotype: either they are jointly run,
or replicators researchers reuse some of the materials employed in the baseline
experiment or they are run at the same site [16-25]. How replications should be
Search WWH ::




Custom Search