Chemistry Reference
In-Depth Information
the reader from the central theme that subject knowledge should be the touchstone
for decisions at all steps.
Modeling begins by satisfying the basic conditions for regression analysis. It is
assumed at the onset that the explanatory variable values are independent of each
other. Also, Type I regression techniques formally require no error in the explanatory
variable(s), although this requirement is relaxed often to the general premise that the
explanatory variable has an immaterial amount of error relative to that of the response
variable. Another approach, such as functional regression, might be required if this
premise was unjustified. Next, the regression residuals (the difference between the
observed response variable value and its predicted value) are assumed to be normally
distributed. Finally, the sample variance around the regression line is assumed to be
independent of the magnitude of the explanatory variable(s), that is, homoscedasticity is
assumed. The last two requirements can be satisfied in some cases by transforming one
or more of the variables prior to regression fitting. A common instance in this chapter
is the logarithmic transformation of the response variable, such as the log of EC 50 . This
transformation can resolve nonnormality of residuals and heteroscedasticity issues
while also conforming to the general toxicological paradigm that response is more often
related linearly to the logarithm of dose than to the arithmetic dose (Finney 1942, 1947).
A combination of univariate statistics and plots allow exploration of a candidate
model relative to these last two requirements. Bacterial bioluminescence 15-minute
EC 50 data for 20 metal ions (McCloskey et al. 1996, Appendix 8.1) and the recently
developed softness index (Kinraide 2009) can be applied to illustrate this approach.
The following statistical analysis system (SAS) code implements analyses with nor-
mality plots and tests of regression residuals ( Figure 8.1 top). It also plots predicted
and observed data (Figure 8.1 middle) and regression residuals versus the explana-
tory variable, σ con (Figure 8.1 bottom).
PROC GLM;
MODEL TOTLEC = SOFTCON;
OUTPUT OUT = LINEAR2 PREDICTED = PRED2 RESIDUAL = RES2;
RUN;
PROC UNIVARIATE NORMAL PLOT;
VAR RES2;
RUN;
SYMBOL1 V = dot COLOR = black; SYMBOL2 V = star COLOR = black;
SYMBOL3 V = dot COLOR = black;
PROC GPLOT;
PLOT TOTLEC*SOFTCON PRED2*SOFTCON/OVERLAY HAXIS = -1.5 to
1.5 by 0.5;
PLOT RES2*SOFTCON/VREF = 0 HAXIS = -1.5 to 1.5 by 0.5;
RUN;
The middle plot in Figure 8.1 shows the values predicted (asterisks) with the model
shown in the upper right corner and the original observations (solid dots). The obser-
vations are distributed uniformly along the axis of the explanatory variable with
no obvious gaps. This minimizes the chance of a few extreme observations having
more influence on the model fitting than others. The coefficient of determination, r 2 ,
Search WWH ::




Custom Search