Information Technology Reference
In-Depth Information
stars, gas, and dust. The number of stars typically varies in the range of 10 7 to 10 14 .
Edwin Hubble introduced a morphological classification scheme based, which became
famous as Hubble sequence. Neighbored classes in this diagram represent galaxies with
similar shape. Hubble's classification scheme differentiates between three main classes:
(1) elliptical galaxies, (2) spiral galaxies, and (3) lenticular galaxies, see [15]. For our
experiment we employ images of galaxies from the Sloan Digital Sky Survey (SDSS) ,
a collection of millions of astronomical objects [1]. Figure 5 shows the UNN g embed-
ding of 100 images of galaxies from the SDSS data basis. Each image is a vector of
40
40 -RGB values, i.e., the data space dimensionality is d =4 , 800 . The figure shows
every 12th galaxy. We can observe that galaxies, which belong to one class according
to Hubble's classification scheme are neighbored on the low-dimensional manifold. El-
liptical galaxies start from the left, while lenticular shapes are places on the right hand
side, a sorting that is similar to the Hubble taxonomy.
×
4
Robust Loss Functions
Loss functions have an important part to play in machine learning, as they define the
error, and thus the design objective while training a functional model. In particular, in
the presence of noise the choice of an appropriate loss function parameterization has an
important part to play. In this section we extend UNN regression by the -insensitive
loss.
4.1
The -Insensitive Loss
In case of noisy data sets over-fitting effects may occur. The employment of the -
insensitive loss allows to ignore errors beyond a level of , and avoids over-fitting to
curvatures of the data that may only be caused by noise effects. With the design of a loss
function, the emphasis of outliers can be controlled. First, the residuals are computed.
In case of unsupervised regression, the error is computed in two steps:
q
d
1. The distance function δ :
maps the difference between the prediction
f ( x ) and the desired output value y to a value according to the distance w.r.t. a
certain measure. We employ the Minkowski metric:
IR
× IR
IR
δ ( x , y )= N
1 /p
i =1 | f ( x i )
y i )
|
,
(5)
which corresponds to the Manhattan distance for p =1 , and to the Euclidean dis-
tance for p =2 .
2. The loss function L :
maps the residuals to the learning error. With the
design of the loss function the influence of residuals can be controlled. In the best
case the loss function is chosen according to the requirements of the underlying data
mining model. Often, low residuals are penalized less than high residuals (e.g. with
a quadratic function). We will concentrate on the -insensitive loss in the following.
IR IR
 
Search WWH ::




Custom Search