Agriculture Reference
In-Depth Information
surveys based on a geographical definition of the statistical unit, because it is
assumed that their weight has very little variability in the target population.
It should be stated that no data-editing program is able to automatically detect,
and impute any error in the data. In general, only the errors that violate some rules
(identifiable errors) can be detected and subjected to appropriate processing to
resolve the inconsistencies. Such an imputation does not necessarily restore the
true information, but changes it to a value that we estimate to be closer to the true
value using a set of logical rules that we believe are valid for the collected data.
Therefore, the automatic editing process may be seen as a way to increase the
quality of the data by constraining them to some prior knowledge.
For this reason, we should only correct the data if we decide that the errors
reduce the quality of the information to below a predefined level, and if we think
that the available set of auxiliary information can correct the data if applied in the
form of compatibility rules. Generally, the problem is to correctly identify this
information. In fact, if we define inappropriate logical rules or apply inadequate
procedures we can introduce a serious bias into the estimates.
By incorrectly defining a set of edit rules we can cause further problems instead
of detecting errors. In fact, we can introduce biases by only partially addressing the
errors, for example, by accurately treating some errors and ignoring others. Addi-
tionally, many edit rules can be defined for a single survey and they may conflict
with each other, leading to inconsistencies. We may also define redundant edit
rules. Even if they are consistent, they can result in too many corrections, which
contrasts with the principle of correcting the data as little as possible.
Problems can also arise if we treat some errors with improper methods. Treating
deterministic errors with imputation methods suited to random errors may introduce
significant biases into the data. Additionally, it may not be optimal to correct errors
in the automatic editing phase. This is good practice when it is possible to perform a
controlled recording of the collected data. However, automatic data editing will
probably identify the errors caused by an incorrect recording, but will impute them
in a non-efficient way. By correcting these errors when they are generated, we will
obtain a better approximation of the correct values.
In the case of interactive data editing, a serious problem may occur if one or
more operators do not comply with the established procedures. The effect of any
bias introduced in this way may be even greater than other cases, because infor-
mation should be restored close to reality in interactive corrections. Indeed, this
mode of operation is usually applied to very influential units, for example, large
farms. In this case, the first step is to return to the questionnaire, or to consult
administrative archives or other sources. If the information therein is not considered
reliable, it must be collected again by returning to the farm (Berthelot and Latouche
1993 ).
Automatic editing procedures should be designed to prevent the introduction of
errors and biases during implementation. Thus, the plan should first carefully assess
if an imputation process is actually required, rather than simply identifying and
counting incompatibilities in the data. In general, it is a good practice to give
priority to methods that have well-known theoretical and statistical properties,
Search WWH ::




Custom Search