Information Technology Reference
In-Depth Information
We now add a second patient diagnosis to the regression. Table 14 gives the chi-square table for
pneumonia and septicemia.
Of the patients with septicemia only (pneumonia=0), 20% died, increasing to 28% with both septi-
cemia and pneumonia. For patients without septicemia but with pneumonia, 5% died. The classification
table for the logistic regression is given in Table 15.
Again, for any threshold value below 98%, the logistic regression model will be over 90% accurate
by identifying most of the observations as non-occurrences so that the false negative rate is over 70%. In
other words, adding a second input variable did not change the problems with the regression, which are
caused by attempting to predict a rare occurrence. We add Immune Disorder to the model (Table 16).
The problem still persists, and will continue to persist regardless of the number of input variables.
We need to change the sample size so that the group sizes are close to equal.
the generalized linear model 2
The linear model is one of the most important tools in the statistical analysis of data, but there are types
of problems for which the linear model is not appropriate. The main problem occurs when the data are
not normally distributed and the variance is not constant. The generalized linear model extends the
general linear model and the regression model by solving these issues, and is therefore applicable to a
wider range of data analysis problems. The generalized linear model enlarges the class of linear models
when the distribution of Y for a fixed x is assumed to be from the exponential family of distributions,
which includes important distributions such as the binomial, Poisson, exponential, and gamma distribu-
tions in addition to the normal distribution. The exponential family has a probability density function
of the form
y
q q
f
-
()
()
b
fy
() exp(
=
ii
i
+
cy
(,))
f
i
i
a
i
where θ i and ϕ are parameters and a i ( f , b ( q and cy (, f are known functions. A link function can be
used to relate the expected value of the outcome to the linear predictor since the effect of the predictors
on the dependent variable may not be linear. The equation of the model is given by
Table 14. Chi-square table for pneumonia and septicemia
Controlling for septicemia=0
Controlling for septicemia=1
pneumonia
Died
Total
DIED
Total
Frequency
Row Pct
Col Pct
0
1
0
1
0
7307726
98.60
95.20
103759
1.40
82.65
7411485
123403
79.58
83.06
31660
20.42
76.09
155063
1
368553
94.42
4.80
21783
5.58
17.35
390336
25175
71.68
16.94
9948
28.32
23.91
35123
Total
7676279
125542
7801821
148578
41608
190186
Search WWH ::




Custom Search