Information Technology Reference
In-Depth Information
where
overlap
(, ,
xy
if the
i
th variable is categorical
i
i
heom
(, )
xy
range
xy
i
i
i
i
,
if
i
th variable is real
i
overlap( x, y ) denotes the Hamming distance defi ned as
0
,
x
y
i
i
overlap(
xy
,
)
i
i
1
,
x
y
i
i
and range i is a scaling factor for the i th continuous variable.
3.3.9.2
Heterogeneous Value Difference Metric
Heterogeneous value diff erence metric (HVDM) metric is defi ned as
N
HVDM
(, )
xy
hvdm
(
x
y
)
2
i
i
i
1
where
vdm
(, ,
xy
i
th variable is categorical
i
i
hvdm
(, )
xy
xy
i
i
i
i
,
i
th variable is real
ra
nge
i
3.3.10 Considerations about Representation
In Gray encoding, two consecutive strings diff er only by 1 bit; thereby, a nity in
problem space is to some extent ma inta ined in shape - space. A rea l number x in [0,1]
can be represented in a binary encoding, using the transformation fl o o r (255 x
+
0.5) and then encoding it in 8 bits. h erefore, binary encoding is not suitable to
achieve good generalization, because matching rules should accurately represent
data proximity in the problem space.
Matching rules also have eff ects in searching the shape-space. For instance, rcb
matching rule produced a gridlike shape; r -chunk matching rule generated similar
but simpler shapes; Hamming distance and Rogers and Tanimoto (R&T) matching
rules produced a “fractal”-like shape. h e shape of areas covered by rcb and r -chunk
matching rules were not aff ected by changing encoding from binary to Gray. h is
was not really unexpected because similarity between two real values is not refl ected
in their binary representations (Gonzalez et al., 2003; Ji and Dasgupta, 2004).
 
Search WWH ::




Custom Search