Java Reference
InDepth Information
STATISTICALLY IMPORTANT
Statistical significance does not mean statistical importance. A baseline with little variance that
averages 1 second and a specimen with little variance that averages 1.01 seconds may have a
p

value of 0.01: there is a 99% probability that there is a difference in the result.
The difference itself is only 1%. Now say a different test shows a 10% regression between speci
men and baseline, but with a
p
value of 0.2: not statistically significant. Which test warrants the
most precious resource of all—additional time to investigate?
Although there is less confidence in the case showing a 10% difference, time is better spent in
vestigating that test (starting, if possible, with getting additional data to see if the result is actually
statistically significant). Just because the 1% difference is more probable doesn't mean that it is
more important.
The usual reason a test is statistically inconclusive is that there isn't enough data in the
samples. So far, the example here has looked at a series with three results in the baseline and
the specimen. What if three additional results are added—again yielding 1, 1.2, and 0.8
seconds for the baseline, and 0.5, 1.25, and 0.5 seconds for the specimen? With the addition
al data, the
p
value drops from 0.43 to 0.19: the probability that the results are different has
risen from 57% to 81%. Running additional tests and adding the three data points again in
creases the probability to 91%—past the usual level of statistical significance.
Running additional tests until a level of statistical significance is achieved isn't always prac
tical. It isn't, strictly speaking, necessary either. The choice of the
α
value that determines
statistical significance is arbitrary, even if the usual choice is common. A
p
value of 0.11 is
not statistically significant within a 90% confidence level, but it is statistically significant
within an 89% confidence level.
The conclusion here is that regression testing is not a blackandwhite science. You cannot
look at a series of numbers (or their averages) and make a judgment that compares them
without doing some statistical analysis to understand what the numbers mean. Yet even that
analysis cannot yield a completely definitive answer due to the laws of probabilities. The job
of a performance engineer is to look at the data, understand those probabilities, and determ
ine where to spend time based on all the available data.