Testing Hypotheses: Choosing a Test Statistic - Common Errors in Statistics

Information Technology Reference

In-Depth Information

VERIFY THE DATA

The first step in any analysis is to verify that the data have been entered

correctly. As noted in Chapter 3, GIGO. A short time ago, a junior biosta-

tistician came into my office asking for help with covariate adjustments for

race. “The data for race doesn't make sense,” she said. Indeed the propor-

tions of the various races did seem incorrect. No “adjustment” could be

made. Nor was there any reason to believe that race was the only variable

affected. The first and only solution was to do a thorough examination of

the database and, where necessary, trace the data back to its origins until

all the bad data had been replaced with good.

The SAS programmer's best analysis tool is PROC MEANS. By merely

examining the maximum and minimum values of all variables, it often is

possible to detect data that were entered in error. Some years ago, I

found that the minimum value of one essential variable was zero. I brought

this to the attention of a domain expert who told me that a zero was

impossible. As it turns out, the data were full of zeros, the explanation

being that the executive in charge had been faking results. Of the 150

subjects in the database, only 50 were real.

Before you begin any analysis, verify that the data have been entered

correctly.

COMPARING MEANS OF TWO POPULATIONS

The most common test for comparing the means of two populations is

based upon Student's t . For Student's t test to provide significance levels

that are exact rather than approximate, all the observations must be inde-

pendent and, under the null hypothesis, all the observations must come

from identical normal distributions.

Even if the distribution is not normal, the significance level of the t test

is almost exact for sample sizes greater than 12; for most of the distribu-

tions one encounters in practice, 2 the significance level of the t test is

usually within a percent or so of the correct value for sample sizes between

6 and 12.

There are more powerful tests than the t test for testing against non-

normal alternatives. For example, a permutation test replacing the original

observations with their normal scores is more powerful than the t test

(Lehmann and D'Abrera, 1988).

Permutation tests are derived by looking at the distribution of values

the test statistic would take for each of the possible assignments of treat-

ments to subjects. For example, if in an experiment two treatments were

2 Here and throughout this text, we deliberately ignore the many exceptional cases (to the

delight of the true mathematician) that one is unlikely to encounter in the real world.

Search WWH ::

Custom Search

Home