Data Reduction - Data Preprocessing in Data Mining

Graphics Reference

In-Depth Information

In the case of classification, the class information can be included in the distance as

d rs = (

1

− α)

d rs + α

c rs

where c rs is the “distance” between the classes x r and x s belong to. This interclass

distance should be supplied subjectively and

α

could be optimized using CV.

6.2.4 Locally Linear Embedding

Locally Linear Embedding (LLE) recovers global nonlinear structure from locally

linear fits [ 25 ]. Its main idea is that each local patch of the manifold can be approxi-

mated linearly and given enough data, each point can be written as a linear, weighted

sum of its neighbors.

The LLE algorithm is based on simple geometric intuitions. Suppose the data

consists of N real-valued vectors X i , each of dimensionality D , sampled from some

smooth underlying manifold. It is expected that each data point and its neighbors

to lie on or close to a locally linear patch of the manifold. The local geometry of

these patches can be characterized by linear coefficients that reconstruct each data

point from its neighbors. In the simplest formulation of LLE, the KNN are estimated

per data point, as measured by Euclidean distance. Reconstruction errors are then

measured by the cost function:

2

ε(

W

) =

X i −

W ij X j

i

j

which adds up the squared distances between all the data points and their recon-

structions. The weights W ij summarize the contribution of the j th data point to the

i sth reconstruction. To compute the weights W ij , it is necessary to minimize the cost

function subject to two constraints: first, that each data point X i is reconstructed only

from its neighbors, enforcing W ij =

0if X j does not belong to this set; second, that

the rows of the weight matrix sum to one: j W ij =

1 s. The optimal weights W ij

subject to these constraints are found by solving a least squares problem.

The constrained weights that minimize these reconstruction errors are invariant

to rotations, scaling, and translations of that data point and its neighbors. Suppose

the data lie on or near a smooth nonlinear manifold of dimensionality d

D .

To achieve a good approximation, then, there exists a linear mapping that maps the

high dimensional coordinates of each neighborhood to global internal coordinates on

the manifold. By design, the reconstruction weights W ij reflect intrinsic geometric

properties of the data that are invariant to exactly such transformations. We therefore

expect their characterization of local geometry in the original data space to be equally

valid for local patches on the manifold. In particular, the same weights W ij that

Data Preprocessing in Data Mining

Search WWH ::

Custom Search

Home