Agriculture Reference
In-Depth Information
recommendations for the correct design and implementation of this important
phase.
Editing software consists of automatic procedures for both error detection and
for imputation. These procedures can be classified according to the type of error. In
fact, errors can be grouped into systematic or random errors. For systematic errors,
we can assume that the correct value is unique for any identifiable subpopulation.
Conversely, random errors are expected to have a margin of residual variability
with respect to possible imputations, regardless of the criterion used to split the data
into subpopulations.
When given two or more variables that have generated an inconsistency, it is
often necessary to make strong assumptions as to which variable is wrong (De Waal
and Pannekoek 2010 ).
The aim of methods used to impute incompatibilities is to make the data eligible
while minimizing the effect on the estimates of each variable of interest. In general,
instead of imputing a suspect or obviously incorrect value, we can use a new contact
from the reporting unit to capture the true value, use information from a previous
period, or replace the inconsistent information with information from similar units.
The latter solution is called probabilistic imputation and is often used to correct
large amounts of data collected on mostly homogeneous statistical units. This
method is cheap, but it must be applied with extreme care to ensure that a strong
bias is not introduced into an important population parameter.
Data editing procedures can be distinguished into two classes according to the
nature of the errors. The first group corrects systematic errors using a set of
deterministic rules such as IF-THEN. The second group is mainly devoted to
treating random errors while having as small an influence as possible on the final
estimates. These methods alter the minimum set of information, such that the
admissible ranges of values are respected and the imputed data has the same
variability as the observed data that has not been affected by errors. Deterministic
and random errors typically both exist in a data file, so we must apply the editing
procedures in a particular order. The preliminary procedures for identifying and
recognizing systematic errors are typically followed by the probabilistic procedures
for random errors.
Some data editing methods avoid the practice of correcting all possible errors,
and only consider those that have a large influence on the estimates of interest. This
is called selective editing (Latouche and Berthelot 1992 ), and is particularly
appropriate when the statistical units have very different influences on the observed
phenomenon. In this case, we should carefully correct only the most important
units, even using expensive methods such as returning to the field. It is important to
note that these techniques are mainly applied to interactive editing, given that after
identifying the errors we wish to assign true values by contacting the respondents.
For example, when analyzing a population of farms, we could first apply selective
editing techniques to the larger farms (in terms of agricultural land, livestock, gross
income, or number of employees) and then later apply a probabilistic procedure to
the smaller and more numerous farms. Obviously, this practice is not widely used in
Search WWH ::




Custom Search