Advanced Methods for the Analysis of Semiconductor Manufacturing Process Data - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

∆y iq ( m )= ∂E ( m )

∂y iq ( m )

∂ 2 E ( m )

∂y iq ( m ) 2

and 0 <MF≤

1 .

(2.12)

In particular for large databases, due to the underlying computational com-

plexity of the standard methods, e.g., the NLM with O( N 2 ), mapping compu-

tation becomes infeasible. Therefore, particular interest was placed on heuris-

tic and hierarchical methods of dimensionality reduction as mapping accel-

erators.

One of the first heuristic accelerating methods of the NLM was pub-

lished by Lee, Slaggle, and Blum [2.34]. Rightly assuming that the gradient

procedure does not always achieve an accurate projection (cf., e.g., [2.9]),

they developed a fast distance-preserving mapping that focuses on the ex-

act preservation of only a limited number of 2 N −

3 distances, neglecting

all remaining ones. For this mapping, the minimum spanning tree (MST) of

the data distance graph is computed. Points are mapped by common trian-

gulation while traversing the MST, based on the previously mapped MST

neighbors serving as pivot point. However, in spite of the appealing heuris-

tic idea, MST computation and traversal itself still has O(N 2 )complexity.

Thus, in own prior work, an even faster mapping algorithm was developed

[2.26]. This alternative mapping, denoted as Visor mapping, also uses a tri-

angulation mapping step, but with three fixed global pivot points that are

heuristically chosen from the data set. The purpose of the pivot point de-

termination is to find the three most extruded data points that meet the

additional constraint of maximum mutual distance while enclosing the re-

maining data set. Based on centroid computation, these three data points

are successively selected as pivot points from the data set. These points are

placed first and the remaining N-3 data points are placed in the visualization

plane employing triangulation.

This algorithm, denoted by Visor [2.26], has O(N) complexity and thus

provides data projections with a very short response time and negligible sensi-

tivity to the database size. As shown by prior investigations with a mapping

quality measure, achievable mapping quality is similar to the NLM [2.26],

[2.24]. Due to their salient properties with regard to speed, convenience, and

transparence, distance-preserving mappings have been applied throughout

this work to the regarded semiconductor manufacturing data. In addition,

e cient hierarchical methods, offering a more delicate speed-accuracy trade-

off are available [2.23] and will be employed in the next stages of the work.

The unsupervised mapping methods discussed so far retain all features

from the high-dimensional feature space and compute a more compact opti-

mized feature space, e.g., for visualization and analysis purposes.

In contrast to this, feature selection actually helps to discard incoming

variables that have no or little significance for the tackled problem. It must

be remembered, that two very different aims can be pursued by the method

of feature selection. For classification tasks, the selection of an as-small-as-

possible group is desired, to allow generalization with a minimum classifica-

Advanced Techniques in Knowledge Discovery and Data Mining

Search WWH ::

Custom Search

Home