Databases Reference
In-Depth Information
Sampling does not consciously search for relevant instances. One
can't help asking “how are the three functions (enabling, focusing, and
cleaning) 40 of feature selection accomplished in sampling?” What does
the wonder is the random mechanism underlying every sampling method.
Enabling 41 and cleaning are possible as the sample is usually smaller than
the original data and noise and irrelevant instances in the sample will
become accordingly less if sampling is performed appropriately. Although
it does not take into account the task at hand, some forms of sampling
can, to a limited extent, help focusing. We present some commonly used
sampling methods below.
Purposive Sampling:
It is a method in which the sample instances are
selected with definite purpose in view. For example, if we want to give
the picture that the knowledge of students in the P.G. Department of
Information and Communication Technology has increased, then we may
take individuals in the sample from students who are securing the marks
> 60% and ignoring the rest. Hence this purposive sampling is a type of
favoritism sampling. This sampling suffers from the drawback of favoritism
and nepotism and does not give a representative sample of the population.
Random Sampling:
In this case the sample instances are selected at
random 42 and the drawback of purposive sampling is completely overcome.
A random sample is one in which each unit of population has an equal
chance of being included in it. Suppose we want to select n instances out
of the N such that every one of the N C n distinct samples has an equal
chance of being drawn. In practice, a random sample is drawn instance by
instance. Since an instance that has been drawn is removed from the data
set for all subsequent draws, this method is also called random sampling
without replacement. Random sampling with replacement is feasible: at
any draw, all N instances of the dataset are given an equal chance of being
drawn, no matter how often they have already been drawn.
Stratified Sampling:
In this sampling the heterogeneous data set of
N instances is first divided into n 1 ,n 2 ,...,n k homogenous subsets. The
subsets are called strata. These subsets are non-overlapping, and together
they comprise the whole of the dataset (i.e., i =1 n i = N ). 17 The instances
are sampled at random from each of these stratums; the sample size in each
stratum varies according to the relative importance of the stratum in the
population. The sample, which is the aggregate of the sampled instances of
Search WWH ::




Custom Search