Information Technology Reference
In-Depth Information
Clarke et al . (2003) reported on an investigation conducted at a South African wood
mill of the underlying relationships between genetic (species) and physiological factors
of wood as well as pulp quality. The data were placed in the public domain with the
intention of inviting researchers to perform their own statistical analyses. The data are
also described by Gardner et al . (2005) and are represented here in the form of an R
dataframe Pine.data . This dataframe consists of the following measurements:
species
Five pine species: Pinus elliottii, P. kesiya, P. maximinoi, P. patula
and P. taeda
TotYield
Total pulp yield expressed as a percentage of the original mass
Alkali
Percentage alkali consumption
Wood density in kg m 3
Density
Tensile energy absorption in mJ g 1
TEA
Tensile
Tensile index
TearingindexinmNm 2
g 1
Tear
BurstindexinkPam 2
g 1
Burst
Growth
Height in metres at 11 years
A preliminary statistical analysis of Pine.data shows that not only are there large
differences between the within-species covariance matrices but also, due to small sample
sizes, some of these covariance matrices are singular. The AoD described above needs
only the assumption that a distance function exists such that inter-sample distances can
be calculated between each pair of the 37 rows of Pine.data . In this example we
restrict ourselves to Pythagorean distances between the samples (after normalizing the
Pine.data to unit column variances) so that, as we have seen above, the AoD biplot
is simply a PCA biplot of the class means with the individual samples interpolated onto
the display. This biplot is shown in Figure 5.34.
The AoD biplot shows that the five group means lie at quite a distance from one
another, with P.kes and P.tae closest. The latter two species are almost similar with
respect to Tensile and Growth ; they differ mostly on Tear , Burst and TEA . P.max
and P.pat are also almost similar on Tensile and Growth , showing small values in
contrast to the large values of P.kes and P.tae . P.ell has moderate Tensile and Growth
values but has the smallest Alkali values. If pulp is needed for a product requiring
high Tensile values, then P.kes and P.tae are candidates to consider; if the product
requires high density values, then P.max is better avoided, although it gives maximum
TotYield .
Are the differences between the five group means statistically significant? In order to
answer this question we can turn to the breakdown of the total sum of squared distances
that we have seen in Section 5.8 to be equivalent to the usual analysis of variance break-
down of a total sum of squares. For the normalized pine data this becomes T = 280;
B = 84.6722 and W = 195.3278. A permutation testing procedure making no distribu-
tional or homogeneous covariance matrix assumptions can be employed to test the null
hypothesis of no significant differences between the group means. Under the null hypoth-
esis the B term above should remain approximately constant over random permutations
of the 37 samples to form the five groups fixing the sample sizes. However, under the
Search WWH ::




Custom Search