Advanced Methods for the Analysis of Semiconductor Manufacturing Process Data - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

tion error. For data analysis, the discovery of all involved variables and the

underlying knowledge are aspired. Feature selection can be understood as

the computation of a constrained matrix A S for a linear mapping with the

following form

⎛

⎝

⎞

⎠

c 1 0

···

0

0 c 2 ···

0

...........

00

A S =

,

(2.13)

··· c M

where only diagonal elements can have nonzero values and the c i ∈{

are

binary variables or switch variables determined by a preceding optimization

process. Thus, a linear mapping y = A x is constituted. However, due to the

constrained matrix A and the fact that column vectors with c i =0canbe

entirely omitted, computation can be simplified to y =[ y 1 ,y 2 ,...,y m ] T

0 , 1

}

with

y i = x j

=0andthecor-

responding features x j are just copied to the y i . Feature selection performs

a scaling of feature or coordinate axes by binary variables, i.e., switching off

dimensions and thus defining a subspace that is salient with regard to the

chosen criterion J . As no rotation of the basis vectors is carried out, explicit

interpretability of the result is sustained. However, due to the binary nature

of the selection process, the difference in importance or the impact of individ-

ual features is occluded. A straightforward extension of the binary matrix A S

given for feature selection is feasible, which allows continuous valued rank-

ing of the features. The binary c i

∀c j

=0,i.e., m corresponds to the number of c j

[0 , 1],

which are determined by a preceding optimization process. The limitation

or normalization to [0 , 1] is introduced for the sake of interpretability and

comparison with corresponding feature selection results. This approach com-

monly denoted by feature weighting (FW) allows a continuous scaling of

features or coordinate axes for a i

are replaced by real variables a i ∈

= 0. Those columns with a i =0canbe

omitted, reducing the matrix from M × M to M × m with m ≤ M .Thus,

in addition to the aspired potentially higher achievable discrimination and

better generalization properties, explicit salient information for data analy-

sis purposes and rule weighting is extracted by this method. One particular

method of finding appropriate a i based on a certain cost function J and a

gradient descent technique can be found in [2.21]. Numerous other options

with regard to the chosen J and the optimization strategy, e.g., evolutionary

computation, are feasible [2.45] and are currently being pursued in ongoing

work. Various strategies and methods for feature selection will be discussed

after presentation of relevant cost functions J .

Cost Functions. In the following, from a larger collection of potential cost

or assessment functions summarized in Fig. 2.12, dedicated cost functions

for feature space assessment introduced in prior work, e.g., [2.27] and [2.28]

[2.22], will be briefly presented for the aim of a self-contained presentation.

These serve for discrimination measuring in terms of class regions separability,

Advanced Techniques in Knowledge Discovery and Data Mining

Search WWH ::

Custom Search

Home