Sorting High-Dimensional Patterns with Unsupervised Nearest Neighbors - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

stars, gas, and dust. The number of stars typically varies in the range of 10 7 to 10 14 .

Edwin Hubble introduced a morphological classification scheme based, which became

famous as Hubble sequence. Neighbored classes in this diagram represent galaxies with

similar shape. Hubble's classification scheme differentiates between three main classes:

(1) elliptical galaxies, (2) spiral galaxies, and (3) lenticular galaxies, see [15]. For our

experiment we employ images of galaxies from the Sloan Digital Sky Survey (SDSS) ,

a collection of millions of astronomical objects [1]. Figure 5 shows the UNN g embed-

ding of 100 images of galaxies from the SDSS data basis. Each image is a vector of

40

40 -RGB values, i.e., the data space dimensionality is d =4 , 800 . The figure shows

every 12th galaxy. We can observe that galaxies, which belong to one class according

to Hubble's classification scheme are neighbored on the low-dimensional manifold. El-

liptical galaxies start from the left, while lenticular shapes are places on the right hand

side, a sorting that is similar to the Hubble taxonomy.

×

4

Robust Loss Functions

Loss functions have an important part to play in machine learning, as they define the

error, and thus the design objective while training a functional model. In particular, in

the presence of noise the choice of an appropriate loss function parameterization has an

important part to play. In this section we extend UNN regression by the -insensitive

loss.

4.1

The -Insensitive Loss

In case of noisy data sets over-fitting effects may occur. The employment of the -

insensitive loss allows to ignore errors beyond a level of , and avoids over-fitting to

curvatures of the data that may only be caused by noise effects. With the design of a loss

function, the emphasis of outliers can be controlled. First, the residuals are computed.

In case of unsupervised regression, the error is computed in two steps:

q

d

1. The distance function δ :

maps the difference between the prediction

f ( x ) and the desired output value y to a value according to the distance w.r.t. a

certain measure. We employ the Minkowski metric:

IR

× IR

→ IR

δ ( x , y )= N

1 /p

i =1 | f ( x i )

− y i )

|

,

(5)

which corresponds to the Manhattan distance for p =1 , and to the Euclidean dis-

tance for p =2 .

2. The loss function L :

maps the residuals to the learning error. With the

design of the loss function the influence of residuals can be controlled. In the best

case the loss function is chosen according to the requirements of the underlying data

mining model. Often, low residuals are penalized less than high residuals (e.g. with

a quadratic function). We will concentrate on the -insensitive loss in the following.

IR → IR

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home