Information Technology Reference
In-Depth Information
model's range of application, potential sources of bias, and the method of
validation (see the following chapter). The section on “Limitations of the
Logistic Regression” from Bent and Archfield [2002] is ideal in this regard:
“The logistic regression equation developed is applicable for stream sites
with drainage areas between 0.02 and 7.00 mi 2 in the South Coastal Basin
and between 0.14 and 8.94 mi 2 in the remainder of Massachusetts, because
these were the smallest and largest drainage areas used in equation devel-
opment for their respective areas.” (The authors go on to subdivide the
area.)
“The equation may not be reliable for losing reaches of streams, such as
for streams that flow off area underlain by till or bedrock onto an area
underlain by stratified-drift deposits (these areas are likely more prevalent
where hillsides meet river valleys in central and western Massachusetts). At
this juncture of the different underlying surficial deposit types, the stream
can lose stream flow through its streambed. Generally, a losing stream
reach occurs where the water table does not intersect the streambed in the
channel (water table is below the streambed) during low-flow periods. In
these reaches, the equation would tend to overestimate the probability of
a stream flowing perennially at a site.”
“The logistic regression equation may not be reliable in areas of Massa-
chusetts where ground-water and surface-water drainage areas for a stream
site differ.” (The authors go on to provide examples of such areas.)
“In these areas, ground water can flow from one basin into another;
therefore, in basins that have a larger ground-water contributing area than
the surface-water drainage area the equation may underestimate the proba-
bility that stream is perennial. Conversely, in areas where the ground-water
contributing area is less than the surface-water drainage area, the equation
may overestimate the probability that a stream is perennial.”
This report by Bent and Archfield also illustrates how data quality, selec-
tion, and measurement bias can restrict a model's applicability.
“The accuracy of the logistic regression equation is a function of the
quality of the data used in its development. These data include the mea-
sured perennial or intermittent status of a stream site, the occurrence of
unknown regulation above a site, and the measured basin characteristics.
“The measured perennial or intermittent status of stream sites in Massa-
chusetts is based on information in the USGS NWIS database. Streamflow
measured as less than 0.005 ft 3 /s is rounded down to zero, so it is possible
that several streamflow measurements reported as zero may have had flows
less than 0.005 ft 3 /s in the stream. This measurement would cause stream
sites to be classified as intermittent when they actually are perennial.”
“Additionally, of the stream sites selected from the NWIS database, 61
of 62 intermittent-stream sites and 89 of 89 perennial-stream sites were
Search WWH ::




Custom Search