Advanced Methods for the Analysis of Semiconductor Manufacturing Process Data - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

feature combinations in general is out of the question. Several alternative

search strategies for FS, employing the cost functions from Section 2.3.2, will

be summarized with regard to achievable performance and required compu-

tational effort.

First-Order Selection Techniques. One simple but often effective way of find-

ing a suboptimum solution with minimum effort is to compute an individual

figure of merit for each feature. This first-order approach neglects possible

higher-order correlations between feature pairs or feature tuples. For assess-

ment or figure of merit computation, for instance, one of the cost function

given in the previous subsection has to be applied. However, the cost func-

tion in this simplified case will be computed separately for each feature. Three

permutations are basically feasible:

•

The figure of merit is computed for a selected feature and a selected com-

bination of classes, i.e., the feature contribution to pairwise class discrimi-

nation is assessed. For instance, the measure q x l ij

could be computed here.

For each class pair, features are ranked according to their individual merit.

Selection from these rank tables can be achieved, for instance, by choos-

ing all features in first-rank position. Table 2.1 gives an example of this

first-order selection scheme for the well-known Iris data. Obviously, for

first-rank position R , features 3 and 4 will be selected. The method can be

computed very quickly, but the rank table grows for given feature number

M and class number L by M ∗

( L ( L −

1) / 2).

•

The figure of merit is computed for a selected feature and for the discrim-

ination of one class versus all others. The corresponding rank table grows

for given feature number M and class number L by M ∗ L .

•

Computing the figure of merit with regard to discriminating all classes for

each feature returns a single column with M elements.

As shown in Table 2.1, the parametric overlap measure q x l ij and its vari-

ants can serve for the three approaches of fast first-order feature selection.

If the parametric assumption is met, then this simple scheme can be very

effective. However, in many practical cases, even for the one-dimensional dis-

tributions of the individual features, a nonparametric nature can be observed.

An effective remedy for this situation is the application of, e.g., the overlap

Table 2.1. Rank table from first-order assessme nt for Iris data.

Feature

R

C 1-2

R

C 1-3

R

C 2-3

x 1

4

1,020

3

1,482

3

0,442

x 2

3

1,065

4

0,890

4

0,255

x 3

2

4,139

1

5,451

2

1,218

x 4

1

4,387

2

5,180

1

1,660

Advanced Techniques in Knowledge Discovery and Data Mining

Search WWH ::

Custom Search

Home