Information Technology Reference
In-Depth Information
represents a “general feature” of empirical reality. The problem may occur when one
tries to use a normal-distribution-based test to analyze data from variables that are
not normally distributed. In such cases, we have two general choices. First, we can
use some alternative “nonparametric” test (a.k.a. “distribution-free test”), but this
often is inconvenient because such tests typically are less powerful and less flexible
in terms of types of conclusions that they can provide. Alternatively, in many cases
we can still use the normal-distribution-based test if we only make sure that the size
of our samples is large enough. The latter option is based on an extremely important
principle, which is largely responsible for the popularity of tests that are based on
the normal function. Namely, as the sample size increases, the shape of the sampling
distribution (i.e., distribution of a statistic from the sample; this term was first used
by Fisher, 1928) approaches normal shape, even if the distribution of the variable in
question is not normal.
However, as the sample size (of samples used to create the sampling distribution of
the mean) increases, the shape of the sampling distribution becomes normal. Note that
for n
30, the shape of that distribution is “almost” perfectly normal. This principle
is called the central limit theorem (this term was first used by Polya in 1920).
=
6.5.1
Violating the Normality Assumption
How do we know the consequences of violating the normality assumption? Although
many statements made in the preceding paragraphs can be proven mathematically,
some of them do not have theoretical proofs and can be demonstrated only empirically,
via so-called Monte Carlo experiments. In these experiments, large numbers of sam-
ples are generated by a computer following predesigned specifications, and the results
from such samples are analyzed using a variety of tests. This way we can evaluate
empirically the type and magnitude of errors or biases to which we are exposed when
certain theoretical assumptions of the tests we are using are not met by our data.
Specifically, Monte Carlo studies were used extensively with normal-distribution-
based tests to determine how sensitive they are to violations of the assumption of
normal distribution of the analyzed variables in the population. The general conclu-
sion from these studies is that the consequences of such violations are less severe
than previously thought. Although these conclusions should not entirely discourage
anyone from being concerned about the normality assumption, they have increased
the overall popularity of the distribution-dependent statistical tests in many areas.
6.6
SUMMARY
In this chapter, we have given a very basic review of appropriate statistical terms and
methods that are encountered in this topic. We reviewed collection, classification,
summarization, organization, analysis, and interpretation of data. We covered with
examples both descriptive and inferential statistics. A practical view of common
probability distributions, modeling, and statistical methods was discussed in the
chapter.
Search WWH ::




Custom Search