Geoscience Reference
In-Depth Information
2.3 EXISTING DATA
It is natural to consider whether existing data or data collected for other purposes could be used
as reference data to reduce the cost of accuracy assessment. Such data must first be evaluated to
ascertain spatial, temporal, and classification scheme compatibility with the LC map that is the
subject of the assessment. Once compatibility has been established, the issue of sampling design
becomes relevant. Existing data may originate from either a probability or nonprobability sampling
protocol. If the data were not obtained from a probability sampling design, the inability to generalize
via rigorous, defensible inference from these data to the full population is a severe limitation. The
difficulties associated with nonprobability sampling are detailed in a separate subsection.
The greatest potential for using existing data occurs when the data have a probability-sampling
origin. Ongoing environmental monitoring programs are prime candidates for accuracy assessment
reference data. The National Resources Inventory (NRI) (Nusser and Goebel, 1997) and Forest
Inventory and Analysis (FIA) (USFS, 1992) are the most likely contributors among the monitoring
programs active in the U.S. Both programs include LC description in their objectives, so the data
naturally fit potential accuracy assessment purposes. Gill et al. (2000) implemented a successful
accuracy assessment using FIA data, and Stehman et al. (2000a) discuss use of FIA and NRI data
within a general strategy of integrating environmental monitoring with accuracy assessment.
At first glance, using existing data for accuracy assessment appears to be a great opportunity
to control cost. However, further inspection suggests that deeper issues are involved. Even when
the data are from a legitimate probability sampling design, these data will not be tailored exactly
to satisfy all objectives of a full-scale accuracy assessment. For example, the sampling design for
a monitoring program may be targeted to specific areas or resources, so coverage would be very
good for some LC classes and subregions but possibly inadequate for others. For example, NRI
covers nonfederal land and targets agriculture-related questions, whereas the FIA's focus is, obvi-
ously, on forested land. To complete a thorough accuracy assessment, it may be necessary to piece
together a patchwork of various sources of existing data plus a supplemental, directed sampling
effort to fill in the gaps of the existing data coverage. The effort required to cobble together a
seamless, consistent assessment may be significant and the statistical analysis of the data complex.
Data from monitoring programs may carry provisions for confidentiality. This is certainly true
of NRI and FIA. Confidentiality agreements permitting access to the data will need to be negotiated
and strictly followed. Because of limited access to the data, progress may be slow if human
interaction with the reference data materials is required to complete the accuracy assessment. For
example, additional photographic interpretation for reference data using NRI or FIA materials may
be problematic because only one or two qualified interpreters may have the necessary clearance to
handle the materials. Confidentiality requirements will also preclude making the reference data
generally available for public use. This creates problems for users wishing to conduct subregional
assessments or error analyses, to construct models of classification error, or to evaluate different
spatial aggregations of the data. It is difficult to assign costs to these features. Existing data obviously
save on data collection costs, but there are accompanying hidden costs related to complexity and
completeness of the analysis, timeliness to report results, and public access to the data.
2.3.1
Added-Value Uses of Accuracy Assessment Data
In the previous section, accuracy assessment is considered an add-on to objectives of an ongoing
environmental monitoring program. However, if accuracy data are collected via a probability
sampling design, these data may have value for more general purposes. For example, a common
objective of LC studies is to estimate the proportional representation of various cover types and
how they change over time. We can use complete coverage maps such as the NLCD to provide
such estimates, but these estimates are biased because of the classification errors present. Although
the maps represent a complete census, they contain measurement error. The reference data collected
Search WWH ::




Custom Search