SOFT COMPUTING FOR FEATURE SELECTION - Knowledge Mining Using Intelligent Agents

Databases Reference

In-Depth Information

Sampling does not consciously search for relevant instances. One

can't help asking “how are the three functions (enabling, focusing, and

cleaning) 40 of feature selection accomplished in sampling?” What does

the wonder is the random mechanism underlying every sampling method.

Enabling 41 and cleaning are possible as the sample is usually smaller than

the original data and noise and irrelevant instances in the sample will

become accordingly less if sampling is performed appropriately. Although

it does not take into account the task at hand, some forms of sampling

can, to a limited extent, help focusing. We present some commonly used

sampling methods below.

Purposive Sampling:

It is a method in which the sample instances are

selected with definite purpose in view. For example, if we want to give

the picture that the knowledge of students in the P.G. Department of

Information and Communication Technology has increased, then we may

take individuals in the sample from students who are securing the marks

> 60% and ignoring the rest. Hence this purposive sampling is a type of

favoritism sampling. This sampling suffers from the drawback of favoritism

and nepotism and does not give a representative sample of the population.

Random Sampling:

In this case the sample instances are selected at

random 42 and the drawback of purposive sampling is completely overcome.

A random sample is one in which each unit of population has an equal

chance of being included in it. Suppose we want to select n instances out

of the N such that every one of the N C n distinct samples has an equal

chance of being drawn. In practice, a random sample is drawn instance by

instance. Since an instance that has been drawn is removed from the data

set for all subsequent draws, this method is also called random sampling

without replacement. Random sampling with replacement is feasible: at

any draw, all N instances of the dataset are given an equal chance of being

drawn, no matter how often they have already been drawn.

Stratified Sampling:

In this sampling the heterogeneous data set of

N instances is first divided into n 1 ,n 2 ,...,n k homogenous subsets. The

subsets are called strata. These subsets are non-overlapping, and together

they comprise the whole of the dataset (i.e., i =1 n i = N ). 17 The instances

are sampled at random from each of these stratums; the sample size in each

stratum varies according to the relative importance of the stratum in the

population. The sample, which is the aggregate of the sampled instances of

Search WWH ::

Custom Search

Home