Environmental Engineering Reference
In-Depth Information
Solution
ticulary useful for detecting departures from normality
in the tails of sample distributions. The steps to be fol-
lowed in applying this test are as follows:
In accordance with the KW procedure, the data are first
combined and ranked, which yields the following results:
Step 1. rank order the sample data.
Step 2. Compute a weighted sum, b , of the differ-
ences between the most extreme observations.
Step 3. Divide the weighted sum by a multiple of the
standard deviation, and square the result to get the
Shapiro-Wilk statistic, W , defined as
Data
Set rank Data
Set rank Data
Set rank
0.71
1
1
2.44
1
18
5.45
2
35
0.78
1
2
2.53
1
19
6.18
2
36
0.89
2
3
2.66
2
20
6.34
2
37
1.02
1
4
2.68
1
21
6.65
2
38
1.09
1
5
2.90
2
22
6.67
1
39
2
b
S N
x
1.22
2
6
3.25
1
23
6.82
2
40
W
=
(10.107)
1.24
2
7
3.35
2
24
6.89
2
41
1
1.42
2
8
3.51
1
25
7.27
1
42
1.53
2
9
3.52
2
26
7.66
1
43
where the numerator is computed as
1.70
2
10
3.54
1
27
7.80
2
44
1.74
1
11
3.62
2
28
8.33
1
45
k
k
1.79
1
12
3.67
2
29
8.55
2
46
b
=
a
(
x
x
)
=
b
(10.108)
1.97
1
13
3.75
2
30
10.44
1
47
N i
− +
1
N i
− +
1
i
i
i
=
1
i
=
1
2.12
1
14
4.46
1
31
15.64
1
48
2.19
1
15
4.62
2
32
17.62
2
49
where x i represents the smallest ordered value in
the i ith pair of extremes, the coefficients a i depend
on the sample size, N , as tabulated extensively by
Shapiro and Wilk (1965), and the value of k is the
greatest integer less than or equal to N /2. Values
of W for selected sample sizes at confidence levels,
α , of 0.01 and 0.05 are shown in Table 10.6. More
extensive relationships between W , N , and α are
generally available in statistical analysis codes,
such as MATlAB ® (MathWorks, natick, MA)
and Statistica ® (Statsoft, Tulsa, oK).
2.20
1
16
4.87
2
33
18.79
1
50
2.43
2
17
5.00
1
34
Based on these results, the following KW parameters
are calculated by summation of the ranks in Sets 1 and
2, which yield: R 1 = 605 and R 2 = 670. Since J 1 = 25,
J 2 = 25, N = J 1 + J 2 = 50, and K = 2, the KW statistic
given by Equation (10.105) is
K
2
12
R
J
k
K
w =
3(
N
+
1)
N N
(
+
1
)
k
k
=
1
12
50 50 1
605
25
2
670
25
2
The hypothesis of normality is rejected at the α
significance level when the calculated Shapiro-Wilk
statistic is less than the applicable value given in
Table 10.6.
=
+
3 50 1
(
+
)
=
0 398
.
(
+
)
The chi-square statistic at the 5% significance level with
K − 1 = 2 − 1 = 1 degree of freedom is given in Appen-
dix C.3 as χ 0 05
2
. .= . Since the calculated value of K w
(= 0.398) is less than χ 0 05
3 841
TABLE 10.6. Values of Shapiro-Wilk Statistic, W
2
. ( . = , the hypothesis that
the population means are the same for the two data sets
is accepted at the 5% significance level.
3 841
Confidence level, α
Sample Size, N
0.01
0.05
3
0.753
0.767
5
0.686
0.762
10.9.4 Normality
10
0.781
0.842
The assumption of normality (or log-normality) of the
population distribution underlies many statistical analy-
ses, and so it is commonly necessary to evaluate whether
the data (or log-transformed data) support the assertion
that the population distribution is normal.
15
0.835
0.881
20
0.868
0.905
25
0.888
0.918
30
0.900
0.927
35
0.910
0.934
40
0.919
0.940
45
0.926
0.945
10.9.4.1  Shapiro-Wilk  Test.  The Shapiro-Wilk test
(Shapiro and Wilk, 1965), which is sometimes called the
W test , is one of the best tests for normality and is par-
50
0.930
0.947
Source of data : Shapiro and Wilk (1965).
 
Search WWH ::




Custom Search