Information Technology Reference
In-Depth Information
different groups. Although computationally expensive, this is a very popular technique,
overcoming limitations such as differences in size or shape of clusters. In this technique
time complexity is about O ( n 2 ) .
3.3
Self-Organizing System for Information Granulation
SOSIG ( S elf- O rganizing S ystem for I nformation G ranulation) algorithm is a system
designed for detecting granules (groups) present in data. It takes a parameter rg
[0 , 1]
denoting required level of details (resolution) of generated result. Originally it has been
designed to deal with point-type data, but in order to reduce its high time complexity it
has been adapted to hyperboxes processing.
Using partitioning of both hyperboxes and data objects, it is possible to create a three-
level structure, which can be helpful to understand relations between data objects. The
main level of the hierarchy is defined by clusters of hyperboxes and the third lowest
level consists of point-type data clusters. The middle level is composed of granules
containing these third level clusters, which are entirely located in one of the hyperboxes.
SOSIG creates a network structure of connected objects (according to a chosen ap-
proach: points or hyperboxes) forming clusters. The organization of the system, includ-
ing the objects as well as the connections, is constructed on the basis of relationships
between input data without any external supervision. The structure elements are repre-
sentatives of input data, that is, an individual object from the structure stands for one
or more object from the input set. As a result, the number of representatives is much
smaller than input data without losing information.
To have convenient and compact notation, let us assume input data are defined as
an information system IS =( U,A ) [9], where U =
{
x 1 ,...,x n }
is a set of objects
and A =
is a set of attributes. The result generated by SOSIG is also
described by an information system IS =( Y,A
{
a 1 ,...,a k }
∪{
a gr }
) , where the last attribute a gr :
Y
→{
1 ,...,nc
}
denotes the label of generated cluster and card ( Y )
card ( U ) and
Y ( δ ( x, y ) <NR ) . Parameter NR in general defines the area of objects
interactions and is re-calculated in every iteration according to current connections in
the network. The steps of the main (learning) part of SOSIG are shown in Algorithm 1.
For a detailed description of the method see [11] and [6].
A measure of usefulness of objects is the similarity level expressed by Equation 4,
which defines a degree of closeness between the examined representative point y and the
most similar point x from the training data. Only training points from the neighborhood
of y (defined by NR) are considered.
x
U
y
s l ( y )= NR
min(
{
δ ( y,x ): x
U
}
)
(4)
To calculate the distance between hyperboxes measure expressed by Equation 5 was
used:
d ( I A ,I B )=(
a B
a A
+
b B
b A
) / 2
(5)
where
denote the sum of subtractions of minimal ( a )and
maximal ( b ) values of granules I A and I B respectively in every dimension. The equation
has been introduced in [1].
a B
a A
and
a B
a A
 
Search WWH ::




Custom Search