Information Technology Reference
In-Depth Information
• diss
type
(
d
,
e
)
<
diss
type
(
b
,
c
)
<
diss
type
(
b
,
f
)
<
diss
type
(
a
,
g
)
• diss
type
(
)
Definition 5.1.
The dissimilarity
diss
type
:
T
C
→
[
d
,
e
)
>
diss
type
(
b
,
d
]
0, 1
, where T
C
is a set of conceptual types is
defined as follows:
∀
(
)
∈
×
t
1
,
t
2
T
C
T
C
,
2
−
2
−
depth
(
t
i
)
diss
type
(
t
1
,
t
2
)=
∑
t
i
∈
t
,
t
1
,
t
i
=
t
2
−
2
−
depth
(
t
i
)
+
∑
t
i
∈
t
,
t
2
,
t
i
=
t
with
•t
∈
T
C
the nearest common parent of t
1
and t
2
t
,
t
is the shortest path between t and t
•
(
)
(
)=
•
depth
t
i
is the depth of t
i
in the type hierarchy, with
depth
Entity
0
.
5.3.1.2 Similarity between two referents
The similarity between the values of two concepts depends on the application domain and
the data type used to express the individual markers. Therefore, several similarity measures
between referents are defined.
If, at least, one of the referents of the concepts is undefined, the value of the similarity is equal
to 1.
2
,c
1
=[
Definition 5.2.
∀
(
c
1
,
c
2
)
∈C
t
1
:
v
1
]
,c
2
=[
t
2
:
v
2
]
and
(
t
1
,
t
2
)
∈
T
C
×
T
C
,
⎨
=
1
if v
1
or v
2
is undefined.
=
sim
ref
strings
(
)
v
1
,
v
2
,
sim
ref
(
)
v
1
,
v
2
or
sim
ref
num
(
)
⎩
v
1
,
v
2
,
otherwise.
With
sim
ref
strings
and
sim
ref
num
two similarity measures for referents described hereafter.
In the next sections, we define only the measures used within the case study. For a detailed
definition and description of the other measures, see Laudy (2010).
Similarity of “String” referents
The idea of sim
ref
string
is to say that, if one of the strings contains a large part of the other one,
the two strings should be declared sufficiently similar in order to be fused. The measure relies
on the proportion of substrings shared between the two referents, regarding the length of the
smallest one.
2
Definition 5.3.
sim
ref
strings
:
S
→
[
0, 1
]
is defined as follows:
2
, where
∀
(
)
∈
S
S
s
1
,
s
2
is the set of all strings,
lengthComSubString
(
s
1
,
s
2
)
sim
ref
strings
(
)=
s
1
,
s
2
(
(
)
(
))
min
length
s
1
, length
s
2
where
Search WWH ::
Custom Search