Information Technology Reference
In-Depth Information
Observers tend to make unsupported extrapolations from small numbers of events.
A sequence of observations can be thought of as a tiny sample drawn from a vast
population, and in statistical terms we would not expect a small sample to be represen-
tative. However, if a robot successfully traverses a room once, a researcher may well
jump to the conclusion that the robot can always do so. The researcher has reasoned
that the robot was designed to avoid obstacles; it successfully did so; and therefore the
robot was working as intended. But whether this conclusion is reasonable depends
on other context. For example, consider a robot that moves entirely at random. It may
nonetheless traverse the room without encountering obstacles—sometimes, but not
always. If we observed such a robot traversing a room, we could draw the inference
that it was doing so by design. The general lesson from such cases is that a cautious
researcher should consider whether any assumptions are statistically reasonable.
A related issue is of confirmation bias. A researcher runs an experiment, it fails,
a problem is found; runs it again, it fails again, another problem is found; and again;
but eventually the experiment appears to succeed. At this point the researcher claims
success and regards the work as done, and may not even feel any need to mention the
failures. But is the claim justified? A colleague told me of an instance in which he
tweaked his motion-detection software again and again, until it finally worked—but
only later discovered that he had been tweaking one version, but running another, sta-
tic version. The “successful” run was pure luck. Claiming a positive result, detached
from the context of failures, tuning, and exploration in which it was achieved, is not
sound science.
Visualization of Results
We use computers to produce results, and can also use computers to help to digest
them. One approach is to apply statistics. Another approach is to use visualization.
Visualization of data is a substantial field in its own right, with a wide range of
established techniques and principles. These are beyond the scope of this topic, but
should be explored by any researcher who has data sets that need rich interpretation.
However, even elementary approaches to reinterpretation of data via graphs can
yield valuable insights. For example, curve fitting can be used to summarize data;
and a graph showing the fitted curve can give a strong sense of whether the fitting
was accurate. Graphs can also be used to interpret data from a variety of perspectives.
The upper graph in Fig. 15.2 shows the number of events observed as a parameter,
“depth”, is increased. (This is real data from an experiment in information retrieval.)
The crosses, joined by a jagged line, show the actual number of events. This graph
illustrates that the number of events declines with increasing depth, but inconsistently;
the long-term trend is unclear. A line has been used to connect the crosses to indicate
overall behaviour. However, including the jagged line in such a graph is a mistake,
especially if the number of points is small, as it wrongly suggests that there is a trend
from point to point. A line is an interpolation between two points; if no data can be
validly said to lie in that space, omit the line.
 
Search WWH ::




Custom Search