Information Technology Reference
In-Depth Information
Va r i a b l e s
The ideal experiment examines the effect of one variable on the behaviour of an
object being studied. How does increasing the volume of data affect execution time?
Can the vision system track rapidly moving objects? How much compression can
be achieved without visibly degrading the image? If no other variables are present,
it is easy to be confident that the variable does indeed affect the behaviour in the
way observed. The test environment should be designed to minimize the effect of
extraneous factors—that is, to unambiguously relate variations in one property to
variations in another. 1
In practice, elimination of variables is remarkably difficult. Even elementary prop-
erties can be surprisingly difficult to measure: for example, access time to material
stored on disk is not just a property of disk hardware, but is affected by access pattern,
presence and size of disk cache, and file system design. Tests should be designed
to yield results that are independent of properties such as system characteristics or
constant-factor overheads that are not part of the hypothesis.
Consider the measurement of performance of two compression techniques. If
tested on different data, the results will be incomparable: we have no way of knowing
whether the better performance is due to use of a better method, or due to choice
of data that is inherently more compressible. Thus one particular component of
a test environment is choice of test data. For some experiments standard data is
available, such as benchmark problems in machine learning or the corpora used
to test compression methods. The use of such standard resources is essential to
experimentation on these problems (although, as noted in Chap. 14 , such resources
also have potential limitations). Where standard data is not available, care should be
taken to ensure that the chosen test data is representative.
A fundamental issue is that you should have a clear understanding of the relevant
parameters. In hashing, table size is a tuneable parameter that directly affects aspects
of performance (collisions and cache behaviour, for example); expected key length
and distribution of key values is another. Some parameters are dependent on others,
1 In careful research published in 1648, Jan-Baptista van Helmont concluded that plants consist of
water:
That all plants immediately and substantially stem from the element water alone I have learnt
from the following experiment. I took an earthen vessel in which I placed two hundred pounds
of earth dried in an oven, and watered with rain water. I planted in it the stem of a willow
tree weighing five pounds. Five years later it had developed into a tree weighing one hundred
and sixty-nine pounds and about three ounces. Nothing but rain (and distilled water) had been
added. The large vessel was placed in earth and covered by an iron lid with a tin-surface that
was pierced with many holes (to allow the soil to breathe while preventing dust from adding to
it -jz). I have not weighed the leaves that came off in the four autumn seasons. Finally I dried
the earth in the vessel again and found the same two hundred pounds of it diminished by about
two ounces. Hence one hundred and sixty-four pounds of wood, bark and roots had come up
from water alone.
.
 
Search WWH ::




Custom Search