Agriculture Reference
In-Depth Information
such a manner that they become planned. In other words, n h is fixed, where h is the
domain code.
In many agricultural surveys, the Xs are typically size measures or, in general,
continuous covariates. Their main use consists of actions not related to the sample
design, but performed after sample selection.
The most common context for the production of sample estimates is a standard
design. The auxiliary information is used only after the data has been collected and
edited. It is in this phase that national statistical institutes (NSIs) put the greatest
effort into the use and development of very complex estimators that could lead to
efficiency improvements (see Chap. 10 ). Note that in sample design, the common
procedure is to stratify according to the size using a set of threshold levels for each
auxiliary variable in the sampling frame.
Most of the literature on optimal stratification relies on the early works of
Dalenius and Hodges (see Horgan 2006 for a review). Their solutions were typi-
cally based on linear programming, and are still very popular in applied survey
sampling. This strategy can be dealt with by the introduction of a take-all
(censused) stratum and one or more take-some (sampled) strata. This procedure is
commonly used by NSIs to select samples. But it is hard to uniquely define the
boundaries of such strata because they are based on a multivariate set of size
measures.
A generalization of these procedures is suggested by Baillargeon and Rivest
( 2009 , 2011 ), and can be used when the survey and stratification variables are
different. In the R package stratification , optimization rules are available to
define the best boundaries of a covariate. However, these classical methods deal
only with the univariate case, and cannot be easily extended when using multiple
covariates for stratification.
Within this context, the use of stratification trees (Benedetti et al. 2008 ) has
several advantages over that of classical univariate Dalenius-type methods. First,
stratification trees do not require either distributional assumptions about the target
variable, or any hypotheses regarding the functional form of the relation between
this variable and the covariates. Moreover, when many auxiliary variables are
available, the stratification tree algorithm can automatically select the most pow-
erful variables to construct the strata. The identified strata are easier to interpret
than those based on linear methods. Finally, they do not require any particular
sample allocations, as they simultaneously allocate the sampling units (Benedetti
et al. 2008 ).
Optimal data partitioning is a classical problem in statistical literature, which
follows the seminal work of Fisher on linear discriminant analysis (Fisher 1936 ).
However, our problem is more directly related to the use of unsupervised classifi-
cation methods for clustering a set of units (in this case a population frame). The
main difference between the two problems is that the underlying objective func-
tions are different; the aim in sampling design is usually to minimize the sample
size, while in clustering it is common practice to minimize the within cluster
variance. There is an intuitive connection between these two concepts, even if the
Search WWH ::




Custom Search