Agriculture Reference
In-Depth Information
and that using the auxiliary variables to design the sample might underestimate the
sample size needed to reach a predetermined level of precision.
This is particularly true if we are using administrative or remotely sensed data. If
the survey and auxiliary variables are the same variables recorded in two different
periods, this hypothesis can be considered acceptable.
A standard alternative is to model the unknown survey variables, Y, in terms of
the known matrix of auxiliaries, X. We wish to derive a model that relates each y v
with the X observed in past surveys or other data sources (administrative data or
satellite classifications). We then assume that this model is still valid for the current
survey period (Baillargeon and Rivest 2009 , 2011 ). The sample allocation for each
stratum is then made on the basis of the anticipated moments of Y given X.Itis
important to emphasize that there are considerable advantages to designing a survey
that is repeated for several time periods, so that the variables collected at each time
have the same definition and we can determine if the phenomenon being investi-
gated is highly dependent on its past values.
An important issue relates to the implicit use of a linear model linking the
auxiliaries and the variable of interest, Y. Clearly, we may use a simple linear
regression if each target variable has its own counterpart within the auxiliaries, or
multiple regression if they represent a set of completely different information only
related to the set of covariates.
The basis of this approach lies in the assumption that our prior knowledge
suggests that the finite population can be viewed as if it were a sample from an
infinite superpopulation, and that the model
ʾ
defines the characteristics of the
superpopulation (Isaki and Fuller 1982 ).
To design a survey, we should thus search for the optimal anticipated variance
( AV ) of the estimator t of the total t . This can be defined as the variance of the
random variable ( t t ) under both the design and the superpopulation model
n
h
i
o E ʾ E s t t
2
2
AV t t
t t
ð
Þ ¼ E ʾ
E s
ð
Þ
½
f
ð
Þ
g
;
ð 8
:
26 Þ
where E ʾ denotes the expectation with respect to the model, and E s denotes the
expectation with respect to the sample design. Clearly, if the estimator is design
unbiased (as is the case with the HT estimator), the second part of Eq. ( 8.26 )willbe
equal to 0.
In the case of a single target variable, the linear model when designing a sample
using an estimated y (and not the auxiliary X) is (S¨rndal et al. 1992 , p. 449)
8
<
y k ¼ x k ʲ þ ʵ k
E ʾ ʵðÞ ¼0
V ʾ ʵðÞ ¼ ˃
ð 8
:
27 Þ
x k
E ʾ ʵ k ʵðÞ ¼0 k 6 ¼ l
2
k ¼ ˃
:
where
ʲ
is a vector of regression coefficients, and
ʵ k are random variables.
Search WWH ::




Custom Search