Advanced Methods for the Analysis of Semiconductor Manufacturing Process Data - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

applied to semiconductor application data for basic feasibility demonstration

in Section 2.4.

2.3.2 Dimensionality Reduction and Interactive Visualization

Motivation. In addition to semiconductor manufacturing, a wide variety

of other technical problems are characterized by typically large sets of high-

dimensional data, obtained, e.g., from sensor registration, medical laboratory

parameters, manufacturing process parameters, financial databases, measure-

ments, or other generally observed features. With regard to the given applica-

tion, significance, correlations, redundancy, and irrelevancy of the variables x i

are a priori unknown. The extraction of underlying knowledge or the reliable

automatic classification requires reduction of the initial data set to the es-

sential information and the corresponding variables. This especially holds, as

the well-known curse of dimensionality (COD) [2.12] makes the compaction

of the data a mandatory prerequisite for reliable decision making. Unsuper-

vised and supervised methods can be employed for this reduction step for

interactive and automatic processing of the data. The exploitation of the

remarkable human perceptive and associative capabilities for the complex

problem of identifying nonobvious correlations, structure, and hidden knowl-

edge in the data can be a powerful complement of existing computational

methods. Of course, an appropriate visual representation is required, which

can be achieved by means of dimensionality reduction or multivariate pro-

jection methods combined with interactive visualization of the data [2.49].

Typical database representation, e.g., as an Excel spread-sheet is not eas-

ily amenable to human perception and understanding. This is illustrated in

Fig. 2.10, together with the alternative human-adapted visual representation

of the same database. Thus, dimensionality reduction is a ubiquitous prob-

lem and together with multivariate data visualization a topic of interest and

interdisciplinary research for more than three decades. Applications of high

economical interest, e.g., the one investigated in this work and other data

mining and knowledge discovery applications, give renewed strong incentive

to the field. Numerous methods were derived in the past for dimensionality

reduction that considerably differ with regard to the methodology, computa-

tional complexity, transparence, and ease of use. In this work, effective meth-

ods promising the best productivity increase will be preferred. The following

common definitions of two main groups of dimensionality reduction methods,

briefly adapted from [2.16], shall clarify the pursued objectives. For a given

sample set X with NM -dimensional feature vectors x =[ x 1 ,x 2 ,...,x M ] T

feature extraction is defined as a transformation

J ( A )= max A J (

A

( v ))

(2.5)

and the special case of feature selection is defined as a transformation

J ( A S )= max A S J (

A S ) .

(2.6)

Advanced Techniques in Knowledge Discovery and Data Mining

Search WWH ::

Custom Search

Home