Table 2-2. Hypothetical times to execute two tests
1.0 seconds 0.5 seconds
Second iteration 0.8 seconds 1.25 seconds
Third iteration 1.2 seconds 0.5 seconds
The average of the specimen says there is a 25% improvement in the code. How confident
can we be that the test really reflects a 25% improvement? Things look good: two of the
three specimen values are less than the baseline average, and the size of the improvement is
large—yet when the analysis described in this section is performed on those results, it turns
out that the probability the specimen and the baseline have the same performance is 43%.
When numbers like these are observed, 43% of the time the underlying performance of the
two tests are the same. Hence, performance is different only 57% of the time. This, by the
way, is not exactly the same thing as saying that 57% of the time the performance is 25%
better, but more about that a little later.
The reason these probabilities seem different than might be expected is due to the large vari-
ation in the results. In general, the larger the variation in a set of results, the harder it is to
guess the probability that the difference in the averages is real or due to random chance.
This number—43%—is based on the result of Student's t-test, which is a statistical analysis
based on the series and their variances. Student, by the way, is the pen name of the scientist
who first published the test; it isn't named that way to remind you of graduate school where
you (or at least I) slept through statistics class. The t-test produces a number called the p-
value , which refers to the probability that the null hypothesis for the test is false. (There are
several programs and class libraries that can calculate t-test results; the numbers produced in
this section come from using the TTest class of the Apache Commons Mathematics Library.)
The null hypothesis in regression testing is the hypothesis that the two tests have equal per-
formance. The p -value for this example is roughly 43%, which means the confidence we can
have that the series converge to the same average is 43%. Conversely, the confidence we
have that the series do not converge to the same average is 57%.