Agriculture Reference
In-Depth Information
have verified application strategies, and for which generalized and well-tested
software is available (De Waal 2009 ). The overall procedure for automatic data
editing must be designed so that the different operational phases are consistent with
each other. To simplify, it is possible to assume that the whole process consists of
the following steps:
1. Detect and impute systematic errors.
2. Selectively edit significant units.
3. Identify and impute random errors on a set of relevant variables.
4. Detect and impute random errors on a set of variables of minor importance,
subject to imputations made in the previous phases.
Each of these steps must provide a period of analysis and validation. They must
also identify possible systematic biases that have been introduced by imperfect
definitions of the edit rules, and solve any problems. The editing operation should
be monitored using output documentation in the form of performance indicators.
Procedures for automatic data editing must be accompanied by an analysis of
outliers and strategies for their treatment. It is hard to identify outliers, because
individual cases may be exact but abnormal (i.e., far away from the mean). A
systematic problem can only be identified if outliers are excessively frequent. This
problem could have been introduced in the previous editing steps, and should be
carefully considered.
Procedures for error identification and imputation should produce useful indi-
cators to monitor the production process. Examples of such indicators consist of
tables showing the number of errors found in the whole survey data set and for each
checked variable. It is also important to analyze the variability of the indicators
among subgroups of units, aggregated according to the geographical domain,
administrative office, or enumerator. The variability in these tables can help to
identify problems and biases introduced by the organization of the survey. The
following indicators should be provided, both aggregated and for subsets of the
population:
1. Partial nonresponse rate for each variable.
2. Violation rates of each edit rule.
3. Imputation rates for each variable and the frequency of each imputation
criterion.
4. Transition matrices for the process from raw data to edited data.
5. Dissimilarities between single and double frequency distributions on some key
variables, before and after identifying and imputing the errors.
6. Differences between the estimates produced from the survey calculated on raw
and edited data.
A file containing the raw data should be maintained for a reasonable period to
calculate these indicators. To ensure that the procedures are correctly applied, they
should be periodically checked and we should monitor the completeness of the
required documentation. Automated editing procedures are computationally inten-
sive and highly skilled technical and statistical personnel may be required.
Search WWH ::




Custom Search