Information Technology Reference
In-Depth Information
Statistical Tools
Statistical tools that have wide application in computer science research include
correlation, regression, and hypothesis testing. Measures of correlation are used to
determine whether two variables depend on each other. Regression is used to identify
the relationship between two variables. These can be used, for example, to determine
whether input size affects speed or whether light intensity affects object recognition.
Given the variability inherit in the experimental output, how do we know that the
results we observe are due to some real effect, and not just to chance? Understanding
this core question is fundamental to understanding not only which statistical tests
to use, but also how to design experiments, and what conclusions can be drawn
from them.
The principle concepts of statistical inference can be seen through a simple exam-
ple (which I present here in some detail, because these concepts are often misunder-
stood). Consider the experiment of trying to determine whether a coin is biased; that
is, whether the coin has a probability of coming up heads that is other than 50 %. Sup-
pose the coin is flipped 12 times, and on 9 times heads are observed. Taken naïvely,
the results of our experiment might suggest that coin is biased: three-quarters of the
flips have turned up heads. But even if the coin is unbiased, on any given sequence
of flips the proportion of heads may diverge from 50 %; any sequence of coin flips
is possible.
The question we have to ask instead is, if a coin is unbiased, how likely are we to
observe 9 heads or more from 12 flips? If this likelihood is sufficiently small, then we
can with confidence—though not with certainty—conclude that the coin is biased.
There are 2 12
4096 distinct sequences of toins cosses. The number that have
12 heads is 1; that have 11 heads is 12; that have 10 heads is 12
=
×
11
/
2
=
66;
and that have 9 heads is
(
12
×
11
×
10
)/(
3
×
2
) =
220. So there is a total of
220
299 ways of getting at least 9 heads. If the coin is unbiased,
then any given sequence of flips, such as hhththhtthth, is as likely as any other
sequence, even tttttttttttt. Therefore the probability of flipping 9 or more heads
with 12 flips of an unbiased coin is 299
+
66
+
12
+
1
=
3 %. A common experimental
protocol is to set a threshold of 5 % probability or less before we are confident
in a conclusion; 3 the probability here is slightly too high to confidently reject the
possibility that the coin is unbiased towards heads.
This example illustrates most of the important concepts behind statistical hypoth-
esis testing . The supposition that “the result was by chance”, is represented by our
null hypothesis—that the coin is truly unbiased. The result we are testing is stated
as the alternative hypothesis—that the coin has a positive bias. We then calculate
the likelihood of the observed or a more extreme result, of 9 or more heads, on the
assumption that the null hypothesis is true. This is known as a one-tailed test. For
/
4096
=
7
.
3 The question of whether and when this protocol is correct or appropriate is beyond the scope of
this topic. The use of thresholds and particular statistical tests is a continuing topic of scientific
debate, and methodologies continue to develop. What is clear is that some use of hypothesis testing
is clearly preferable to simple reporting of averages and claimed “improvements”.
 
Search WWH ::




Custom Search