Sorting High-Dimensional Patterns with Unsupervised Nearest Neighbors - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

Let r be the residual, i.e., the distance δ in data space. L 1 ( r )=

,and L 2 = r 2

are often employed as loss functions. We will use the L 2 loss for measuring the final

DSRE, but concentrate on the -insensitive loss L during training of the UNN model.

The L is defined as:

L ( r )= 0

(6)

|−

|≥

L is not differentiable at

= . In contrast to the L 1 and the L 2 loss L ignores

residuals below , and thus avoids over-adaptation to noise.

4.2

Experiments

In the following, we concentrate on the influence of loss functions on the UNN learning

results. For this sake, we employ two kinds of ways to evaluate the final embedding:

We measure the final L 2 -based DSRE, visualize the results by colored embeddings, and

show the latent order of the embedded objects. Again, we concentrate on two data sets,

i.e., a 3D-S data set with noise, and the USPS handwritten digits.

3D-S with Noise. In the first experiment we concentrate on the 3D-S data set. Noise is

modeled by multiplying each data point of the 3D-S with a random value drawn from

the Gaussian distribution: y =

· y . Table 2 shows the experimental results

of UNN and UNN g concentrating on the -insensitive loss for K =5 , and Minkowski

metric with p =2 on the 3D-S data set with hole (3D-S h ). The left part shows the results

for 3D-S without noise, the right part shows the results with noise ( σ =5 . 0 ). At first,

we concentrate on the experiments without noise. We can observe that (1) the DSRE

achieved by UNN is minimal for the lowest , and (2), for UNN g low DSRE values are

achieved with increasing (to a limit as of =3 . 0 ), but the best DSRE of UNN g is

worse than the best of UNN. Observation (1) can be explained as follows. Without noise

for UNN ignoring residuals is disadvantageous: all intermediate positions are tested,

and a good local optimum can be reached. For observation (2) we can conclude that a

good strategy against local optima of UNN g is to ignore residuals beyond threshold .

For the experiments with noise of the magnitude σ =5 . 0 we can observe a local

DSRE minimum: for =0 . 8 in case of UNN, and =3 . 0 in case of UNN g . For UNN

local optima caused by noise can be avoided by ignoring residuals, for UNN g this is

already the case without noise. Furthermore, for UNN g we observe the optimum at the

same level of .

Figures 6 (a) and (b) show embeddings of UNN and UNN g without noise, and the

settings =0 . 2 ,and =3 . 0 , corresponding to the settings of Table 2 that are shown

in bold. Similar colors correspond to neighbored embeddings in latent space. The visu-

alization shows that for both embeddings neighbored points in data space have similar

colors, i.e., they correspond to neighbored latent points. The UNN embedding results

in a lower DSRE. This can hardly be recognized from the visualization. Only the blue

points of UNN g seem to be misplaced on the upper and lower part of the 3D-S.

Figures 7 (a) and (b) show the visualization of the UNN embeddings on the noisy

3D-S. The structure of the 3-dimensional S is obviously disturbed. Nevertheless, neigh-

bored parts in data space are assigned to similar colors. Again, the UNN embedding

(0 ,σ )

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home