Agriculture Reference
In-Depth Information
^ ¼ V s ^ s þ
;
n 1
n
2
E s S ^ , s
˃
ð 7
:
1 Þ
where ^
is the vector of the expanded-value y k /
π k ,
π k is the first-order inclusion
probability for the unit k ,
˃
2
^ is the constant and unknown population variance of the
variable ^ . V s ^ s is the variance between samples of the HT estimator of the mean
according to the design p ( s ), and E s S ^ , s
is the expectation, of the sample
variances of ^ .
It can be seen from Eq. ( 7.1 ) that the HT estimator can be more efficient by
setting the first-order inclusion probabilities in such a way that y k /
π k is approxi-
mately constant (or, similarly, so that they are approximately proportional to y, see
Sect. 6.4 ) and/or by defining a design p ( s ) that increases the expected within sample
variance. The intuitive explanation for this is that if a sample s contains as much
information as possible, the uncertainty in the estimation process is clearly reduced
to zero. This consideration suggests that we should find a rule that makes the
probability p ( s ) of selecting a sample s proportional, or more than proportional, to
its variance S 2 (that is usually considered as an indicator of the information
content). This variance is unknown, because it is relative to the target, unobserved
variable y. Thus, this is a purely theoretical topic unless we can find auxiliary
information for the sample variance S 2 .
When dealing with spatially distributed populations, a promising candidate for
this rule is the distance between units, as evidenced in the spatial interpolation
literature (Ripley 1981 ; Cressie 1993 ). This is because it is often highly related to
the variance of variables observed on a set of geo-referenced units. It is interesting
to note that these methods are much more applied to the physical and environmental
sciences than to economic or social data. One of the essential tools used in this field
is the semivariogram
( d ) (see Eq. 1.38 ). The shape of a semivariogram contains
valuable information for deciding if the variance of y is a function of the distance
between statistical units. Therefore, the intuitive scheme for spreading the sample
over a study region leads to efficient designs, if and only if there are reasons to
assume that
ʳ
( d ) is an increasing function of the distance d .
This will surely happen when y has a linear or monotone spatial trend, or when
there is spatial homogeneity (i.e., closer units present very similar data). But
situations like these do not necessarily hold over the whole region, and often they
may significantly change from one zone to another. Thus, before attempting to
spread the sample units as much as possible, we must determine an estimate
ʳ
ðÞ of
the semivariogram. This is used to confirm our hypothesis that the distance is an
efficient proxy for the variance of y. For this purpose, we do not need very accurate
information. A rough estimate, obtained from previous surveys or variables related
to y, should be enough to verify the possibility of selecting samples that are
spatially well-distributed.
Moreover, we should recall the classical Yates-Grundy-Sen formulation of the
HT estimator variance to better clarify the link between sampling variance and
ʳ
Search WWH ::




Custom Search