Experimentation - Writing for Computer Science

Information Technology Reference

In-Depth Information

outcomes. And if increasing a value has an effect, what happens with a further

increase? That is, you should explore factors broadly enough to make trends and

patterns clear.

Consider the difficulties in measuring how the performance of a distributed algo-

rithm changes as the number of servers is increased. Suppose the volume of data is

fixed. As the number of servers is increased, it may be that at some point the data on

each machine can be held wholly in memory (because the per-machine volume of

data is falling), so that disk is not required. The experiment might then show a dra-

matic increase in performance; but this would in fact be due to the reduced problem

size, not to the power of distribution. On the other hand, if the data volume is scaled

with number of machines, then the characteristics of the data set as a whole may be

altered. In some cases there is no straightforward resolution to such issues.

At the start of Chap. 4 , I introduced an example of a research program that was

designed to investigate the impact of cache on the relative performance of two data

structures. This example illustrates some of the difficulties in producing a robust,

persuasive experimental design.

First, my proposal was that performance be assessed by counting cache misses

(an operation that may be supported in hardware, or may involve use of a simulator).

This approach produces evidence that is clearly related to the hypothesis, which

concerns cache behaviour, but does not directly demonstrate an improvement in

computational efficiency. 3 Second, the relative behaviour will depend on the data

volumes. For example, it may be that, as the amount of data is increased, the costs

for the tree-based algorithm grow slowly until some thresholds are exceeded, then

grow dramatically as just a little more data is introduced, and then grow slowly

again. Third, behaviour with growth might be obvious for some data sets, and less

conspicious for others, depending on the pattern of accesses.

To produce strong experimental designs for this problem, researchers needs to

use their understanding (that is, models, formal or otherwise) of the algorithms to

anticipate whether issues of robustness will arise, and then to choose measures and

data that have the potential to expose inconsistencies between the models and the

behaviour is practice.

Performance of Algorithms

As an example of experimental design, consider potential approaches to assessment

of the performance of algorithms. The tools, as discussed in Chap. 4 , are formal proof,

mathematical modelling, simulation, and experimentation. How these are used flows

3 Is computational efficiency even well defined? Is it the number of instructions used, say, or the

number of seconds each CPU core spends on the process? These are no longer equivalent. If the

core is otherwise idle, due to memory access delays, is processing time relevant at all? The answers

to these issues will depend on the research question.

Search WWH ::

Custom Search

Home