Agriculture Reference
In-Depth Information
However, in agricultural surveys that use farms or other legal bodies as statistical
units, unit and item nonresponse can both occur. They are treated separately using
different methods.
When dealing with unit nonresponses, we generally prefer to adjust the sampling
weights rather than impute missing values. Clearly, the reverse occurs when
considering item nonresponse.
Imputation is a procedure that replaces the missing values with artificial data that
is generated using explicit modeling assumptions. Sampling estimation is thus
performed on the completed data set. Imputed values are estimates. So they are
affected by errors that can be viewed as measurement errors, as when an erroneous
value is recorded for a selected unit. Commonly used imputation techniques are
regression imputation, nearest neighbor imputation, hot deck imputation, and
multiple imputation (for a review, see Little and Rubin 2002 ).
However, these solutions are subjective and are generally implemented on each
single survey. They are rarely robust to different hypotheses on the response
behavior. It is interesting to note that, in surveys based on list frames of farms,
the response rate should be carefully evaluated as the ratio of the number of
respondents to the number of eligible units. This is because missing data may
also arise from non-eligible (or out of scope of the survey) units that have been
wrongly included in the frame. Thus, because of errors in the frame, the denomi-
nator (the number of eligible units) must be estimated (Brick and Montaquila 2009 ).
For this reason, it is very important to distinguish between the different reasons for
missing data. These problems do not exist if we are using a frame of spatial units.
The concept of eligibility of a unit is obvious and does not rapidly change over time,
so we can assume that there are no over or under coverage errors in the frame.
The most widely accepted approach for accounting for unit nonresponses is
nonresponse propensity weighting (Haziza et al. 2010 ). It is based on the hypothesis
that the response mechanism can be viewed as a two-phase design (see Sect. 6.7 ).
The first phase, s , is the sampling scheme designed by the survey planners. The
second phase, r , is a subsample, in which only the respondent set is observed
according to an unknown random criterion. Here, each unit k in the population
has a response probability, P ( k
r k . To estimate these probabilities, we must
specify a working model according to any dependencies on the auxiliary variables,
which can be fitted using a sample or the population data (S¨rndal and Lundstr¨m
2005 ). This framework is based on two types of information: the known first-order
inclusion probabilities
2
r | s ) ¼
π k (design-based), and the unknown response probabilities
^
r k (estimated using an appropriate model). This mixture of design and model-based
inference is clearly enough to generate some discrepancies, even without consid-
ering that any expansion procedure requires that all the response probabilities are
strictly positive (Kott 1994 ). This requirement seems intuitive, but it is often
violated. This is particularly the case in surveys based on spatial units, when a
unit may not be generally observed because of barriers or physical impediments
(so it has a response probability of 0). Moreover, response modeling is typically
justified by the simple way it handles formal aspects of the adjusted estimates, and
not by some prior knowledge on the response process. As a consequence, it is very
Search WWH ::




Custom Search