Graphics Reference
In-Depth Information
Regression Diagnostic Plots
6.4
Ater developing an initial reasonably fitting linear regression model,an analyst may
wishtoassess whether any of the observations unusually impact on the quality ofthe
fit.
Case Statistics
6.4.1
Astrategyfordoingsoistoexaminevariouscase statistics that have a value for each
of the n cases in the data set. Belsley et al. ( ) presents detailed discussions of
case statistics including definitions, formulas, interpretation, and suggested thresh-
olds for flagging a case as unusual. If a case statistic has a value that is unusual, based
on thresholds developed in the literature, the analyst should scrutinize the case. One
action the analyst might take is to deletethe case. his is justified if the analyst deter-
mines the case is not a member of the same population as the other cases in the data
set. But deletion is just one possibility. Another is to determine that the flagged case
is unusual in ways apart from those available in its information in the present data
set, and this may suggest a need for additional predictors in the model.
Wefocusonfive distinct casestatistics, each having adifferent function and inter-
pretation. (One of these, DFBETAS, is a vector with a distinct value for each regres-
sion coe cient including the intercept coe cient.) For small data sets the analyst
may choose to display each statistic for all cases. For larger data sets we suggest that
the analyst display only those values of the case statistics that exceed a threshold, or
flag, indicating that the case is unusual in some way.
A regression diagnostic plot displays all commonly used case statistics on a sin-
gle page. Included are thresholds for flagging cases as unusual along with identifi-
cation of such cases. Heretofore, presentations of regression diagnostics have been
presented on multiple pages of tabular output with one row per observation. It is
very di cult to read such tables and examine them for cases that exceed accepted
thresholds.
Example - Kidney Data
6.4.2
Creatine clearance is an important but di cult to measure indicator of kidney func-
tion.ShihandWeisberg( ),alsopresentedinNeteretal.( ),discussthemod-
eling of clearance as a function of the more readily measured variables serum
clearance concent ration, age ,and weight . he datafile is (hh/datasets/
kidney.dat).
At an intermediate stage of analysis of these data, the researchers posed the linear
regression model
clearance ~ concent + age + weight + concent * age
Figure . isaregressiondiagnosticplotforthismodel.Itflagsfivecasesasbeing
unusual. Case has a high leverage value implying an unusual set of predictors. he
Search WWH ::




Custom Search