Information Technology Reference
In-Depth Information
Let r be the residual, i.e., the distance δ in data space. L 1 ( r )=
,and L 2 = r 2
are often employed as loss functions. We will use the L 2 loss for measuring the final
DSRE, but concentrate on the -insensitive loss L during training of the UNN model.
The L is defined as:
r
L ( r )= 0
if
|
r
|
<
(6)
|
r
|−
if
|
r
|≥
L is not differentiable at
= . In contrast to the L 1 and the L 2 loss L ignores
residuals below , and thus avoids over-adaptation to noise.
|
r
|
4.2
Experiments
In the following, we concentrate on the influence of loss functions on the UNN learning
results. For this sake, we employ two kinds of ways to evaluate the final embedding:
We measure the final L 2 -based DSRE, visualize the results by colored embeddings, and
show the latent order of the embedded objects. Again, we concentrate on two data sets,
i.e., a 3D-S data set with noise, and the USPS handwritten digits.
3D-S with Noise. In the first experiment we concentrate on the 3D-S data set. Noise is
modeled by multiplying each data point of the 3D-S with a random value drawn from
the Gaussian distribution: y =
· y . Table 2 shows the experimental results
of UNN and UNN g concentrating on the -insensitive loss for K =5 , and Minkowski
metric with p =2 on the 3D-S data set with hole (3D-S h ). The left part shows the results
for 3D-S without noise, the right part shows the results with noise ( σ =5 . 0 ). At first,
we concentrate on the experiments without noise. We can observe that (1) the DSRE
achieved by UNN is minimal for the lowest , and (2), for UNN g low DSRE values are
achieved with increasing (to a limit as of =3 . 0 ), but the best DSRE of UNN g is
worse than the best of UNN. Observation (1) can be explained as follows. Without noise
for UNN ignoring residuals is disadvantageous: all intermediate positions are tested,
and a good local optimum can be reached. For observation (2) we can conclude that a
good strategy against local optima of UNN g is to ignore residuals beyond threshold .
For the experiments with noise of the magnitude σ =5 . 0 we can observe a local
DSRE minimum: for =0 . 8 in case of UNN, and =3 . 0 in case of UNN g . For UNN
local optima caused by noise can be avoided by ignoring residuals, for UNN g this is
already the case without noise. Furthermore, for UNN g we observe the optimum at the
same level of .
Figures 6 (a) and (b) show embeddings of UNN and UNN g without noise, and the
settings =0 . 2 ,and =3 . 0 , corresponding to the settings of Table 2 that are shown
in bold. Similar colors correspond to neighbored embeddings in latent space. The visu-
alization shows that for both embeddings neighbored points in data space have similar
colors, i.e., they correspond to neighbored latent points. The UNN embedding results
in a lower DSRE. This can hardly be recognized from the visualization. Only the blue
points of UNN g seem to be misplaced on the upper and lower part of the 3D-S.
Figures 7 (a) and (b) show the visualization of the UNN embeddings on the noisy
3D-S. The structure of the 3-dimensional S is obviously disturbed. Nevertheless, neigh-
bored parts in data space are assigned to similar colors. Again, the UNN embedding
N
(0 )
 
Search WWH ::




Custom Search