Information Technology Reference
In-Depth Information
Algorithm 1.
Main steps of SOSIG algorithm
Data
:
-
IS
=(
U,A
) - an information system, where
U
=
{x
1
,...,x
n
}
is a set of objects and
A
=
{a
1
,...,a
k
}
is a set of attributes,
-
{δ
a
:
a ∈ A}
- a set of distance function of form
δ
a
:
V
a
× V
a
→
[0
, ∞
),where
V
a
is
a set of values for attribute
a ∈ A
and a global distance function
δ
:
U × U →
[0
, ∞
)
defined by
δ
(
x,y
)=
fusion
(
δ
a
1
(
a
1
(
x
)
,a
1
(
y
))
,...,δ
a
k
(
a
k
(
x
)
,a
k
(
y
)))
-
size
net
∈{
0
,
1
,...,card
(
U
)
}
- initial size of the network,
rg ∈
[0
,
1] - resolution of
granulation,
Result
:
IS
=(
Y,A ∪{a
gr
}
) - an information system, where the last attribute
a
gr
:
Y →{
1
,...,nc}
denotes label of generated granule and
card
(
Y
)
≤ card
(
U
) and
∀x ∈ U∃y ∈ Yδ
(
x,y
)
<NR
begin
[
NR
init
,Y
]
←−
initialize(
U,A,size
net
)
;
for
y
i
,y
j
∈ Y,i
=
j
do
/*form clusters*/
if
δ
(
y
i
,y
j
)
<NR
init
then
connect(
y
i
,y
j
)
;
NR
NR
init
;
while
¬
stopIterations
(
Y
)
do
for
y ∈ Y
do
Δ
(
y
)=(
δ
(
y,x
))
x∈U
; /*calculate distances between input data*/;
s
l
(
y
)=
NR−
min
Δ
(
y
);/*similarity level of the object from the network*/;
delete(
U,A, Y
)
; /*remove redundant network objects*/;
for
y
i
,y
j
∈ Y,i
=
j
do
/* reconnect objects*/
if
δ
(
y
i
,y
j
)
<NR
then
connect(
y
i
,y
j
)
;
a
gr
(
y
i
)
←−
0;
a
gr
(
y
j
)
←−
0;
grLabel ←−
1;
for
y
i
∈ Y
do
/*label objects*/
if
a
gr
=0
then
a
gr
(
y
i
)
←− grLabel
;
for
y
j
∈ Y,j
=
i
do
if
connected(
y
i
,y
j
)
then
a
gr
(
y
j
)
←− grLabel
;
grLabel ←− grLabel
+1;
for
y
i
∈ Y
do
/*calculate the nearest neighbor for every objects*/;
δ
NN
(
y
i
)=min(
{δ
(
y
i
,y
j
):
y
j
∈ Y
&
j
=
i}
);
NR
←−
δ
NN
(
y
)
;/*new value of NR*/;
if
¬
stopIterations
(
Y
)
/*test the stopping condition */
then
joinNotRepresented(
U,Y, NR, Δ
)
;
adjust(
Y,U,A,NR
)
;
←−
rg
·
y
∈
Y
card
(
Y
)
To control the size of the network there is a removal step, where useless objects are
removed. It affects redundant objects from the network representatives. The term redun-
dant applies to the points with the same input object (from
U
) in their neighborhood.
The best points stay in the network and also the ones which are not redundant for other