Constructing QSARs for Metal Ions - Fundamental QSARs for Metal Ions

Chemistry Reference

In-Depth Information

the reader from the central theme that subject knowledge should be the touchstone

for decisions at all steps.

Modeling begins by satisfying the basic conditions for regression analysis. It is

assumed at the onset that the explanatory variable values are independent of each

other. Also, Type I regression techniques formally require no error in the explanatory

variable(s), although this requirement is relaxed often to the general premise that the

explanatory variable has an immaterial amount of error relative to that of the response

variable. Another approach, such as functional regression, might be required if this

premise was unjustified. Next, the regression residuals (the difference between the

observed response variable value and its predicted value) are assumed to be normally

distributed. Finally, the sample variance around the regression line is assumed to be

independent of the magnitude of the explanatory variable(s), that is, homoscedasticity is

assumed. The last two requirements can be satisfied in some cases by transforming one

or more of the variables prior to regression fitting. A common instance in this chapter

is the logarithmic transformation of the response variable, such as the log of EC 50 . This

transformation can resolve nonnormality of residuals and heteroscedasticity issues

while also conforming to the general toxicological paradigm that response is more often

related linearly to the logarithm of dose than to the arithmetic dose (Finney 1942, 1947).

A combination of univariate statistics and plots allow exploration of a candidate

model relative to these last two requirements. Bacterial bioluminescence 15-minute

EC 50 data for 20 metal ions (McCloskey et al. 1996, Appendix 8.1) and the recently

developed softness index (Kinraide 2009) can be applied to illustrate this approach.

The following statistical analysis system (SAS) code implements analyses with nor-

mality plots and tests of regression residuals ( Figure 8.1 top). It also plots predicted

and observed data (Figure 8.1 middle) and regression residuals versus the explana-

tory variable, σ con (Figure 8.1 bottom).

PROC GLM;

MODEL TOTLEC = SOFTCON;

OUTPUT OUT = LINEAR2 PREDICTED = PRED2 RESIDUAL = RES2;

RUN;

PROC UNIVARIATE NORMAL PLOT;

VAR RES2;

RUN;

SYMBOL1 V = dot COLOR = black; SYMBOL2 V = star COLOR = black;

SYMBOL3 V = dot COLOR = black;

PROC GPLOT;

PLOT TOTLEC*SOFTCON PRED2*SOFTCON/OVERLAY HAXIS = -1.5 to

1.5 by 0.5;

PLOT RES2*SOFTCON/VREF = 0 HAXIS = -1.5 to 1.5 by 0.5;

RUN;

The middle plot in Figure 8.1 shows the values predicted (asterisks) with the model

shown in the upper right corner and the original observations (solid dots). The obser-

vations are distributed uniformly along the axis of the explanatory variable with

no obvious gaps. This minimizes the chance of a few extreme observations having

more influence on the model fitting than others. The coefficient of determination, r 2 ,

Fundamental QSARs for Metal Ions

Search WWH ::

Custom Search

Home