Information Technology Reference
In-Depth Information
SOLVE THE RIGHT PROBLEM
Don't be too quick to turn on the computer. Bypassing the brain to
compute by reflex is a sure recipe for disaster.
Be sure of your objectives for the model: Are you trying to uncover
cause-and-effect mechanisms? Or derive a formula for use in predictions?
If the former is your objective, standard regression methods may not be
appropriate.
A researcher studying how neighborhood poverty levels affect violent
crime rates hit an apparent statistical roadblock. Some important
criminological theories suggest that this positive relationship is curvilinear
with an accelerating slope while other theories suggest a decelerating
slope. As the crime data are highly variable, previous analyses had used
the logarithm of the primary end point—violent crime rate—and reported
a significant negative quadratic term (poverty*poverty) in their least-
squares models. The researcher felt that such results were suspect, that
the log transformation alone might have biased the results toward
finding a significant negative quadratic term for poverty.
But quadratic terms and log transforms are irrelevancies, artifacts result-
ing from an attempt to squeeze the data into the confines of a linear
regression model. The issue appears to be whether the rate of change of
crime rates with poverty levels is a constant, increasing, or decreasing
function of poverty levels. Resolution of this issue requires a totally differ-
ent approach.
Suppose Y denotes the variable you are trying to predict and X denotes
the predictor. Replace each of the y [ i ] by the slope y *[ i ] = ( y [ i + 1] -
y [ i ])/( x [ i + 1] - x [ i ]). Replace each of the x [ i ] by the midpoint of the inter-
val over which the slope is measured, x *[ i ] = ( x [ i + 1] - x [ i ])/2. Use the
permutation methods described in Chapter 5 to test for the correlation if
any between y * and x *. A positive correlation means an accelerating
slope, a negative correlation, a decelerating slope.
Correlations can be deceptive. Variable X can have a statistically significant
correlation with variable Y , solely because X and Y are both dependent on
a third variable Z . A fall in the price of corn is inversely proportional to
the number of hay-fever cases only because the weather that produces a
bumper crop of corn generally yields a bumper crop of ragweed as well.
Even if the causal force X under consideration has no influence on the
dependent variable Y , the effects of unmeasured selective processes can
produce an apparent test effect. Children were once taught that storks
brought babies. This juxtaposition of bird and baby makes sense (at least
to a child) because where there are houses there are both families and
chimneys where storks can nest. The bad air or miasma model (“common
sense” two centuries ago) works rather well at explaining respiratory ill-
nesses and not at all at explaining intestinal ones. An understanding of the
Search WWH ::




Custom Search