Information Technology Reference
In-Depth Information
algorithm. The
algorithm is similar, but requires that pattern 1
and
one of patterns 2 and 3 are above the threshold. Both thresholds were found
experimentally. In our experiments, the
has-a
is-a
threshold was 7 and
has-a
threshold
ranged from 20 to 50.
6 Exploiting Co-parents to Find More
is-a
Relationships
Examining the ontology built thus far reveals an interesting property when pairs
of tags share the same child. Consider the following example: the ontology may
contain “fiction
urban-fantasy”, where “fic-
tion” and “fantasy” are both parents for “urban-fantasy” w.r.t. the
→
urban-fantasy” and “fantasy
→
is-a
rela-
tionship.
5
relationship between “fiction” and “fantasy” may
be missing. One possible reason for this is that people tend to use the more spe-
cific tags leading to “fiction
However, the
is-a
→
urban-fantasy” and “fantasy
→
urban-fantasy”,
so that “fiction
fantasy” does not occur above the relatively high threshold
needed to avoid noise.
Hence we have the following hypothesis: in a co-parent structure it is more
likely than usual that the two parents are in an
→
relationship. Hence, we
include the following additional step (Algorithm 6) to ONTECTAS: for such co-
parent pairs, we re-examine the pair's confidences under a lower threshold and
extract candidate tuples for an
is-a
is-a
relationship.
Algorithm 6.
Co Parent Pruning
Input:
(
T
) A set of tuples with
is-a
relationships in form
parentTag,childTag
;(
F
)
A set of frequent itemsets
Output:
(
T
) An enhanced set of tuples with
is-a
relationships
1:
T
← T
2:
G ←
A graph where each tuple in
T
corresponds to an edge from
parentTag
to
childT ag
.
3:
S ←
All tuples of tags
parent
1
,parent
2
, child
s.t. (1) edge(
parent
1
→ child
)
∈ G
and (2) edge(
parent
2
→ child
)
∈ G
and
(3) edge(
parent
1
→ parent
2
)
∈ G
and (4) edge(
parent
2
→ parent
1
)
∈ G
.
4:
for all
parent
1
,parent
2
, child∈S
do
5:
if
{parent
1
,parent
2
}
is frequent and if it satisfies lower forward and reverse
confidence thresholds
then
6: Add
parent
1
,parent
2
to
T
with the more frequent tag as the parent.
7:
end if
8:
end for
9: Return
T
As a final step of the ONTECTAS algorithm, following standard practice in
ontology extraction algorithms, if the graph of relationships is disconnected, we
add a generic “Entity” root node and make it the parent of all orphan nodes.
5
Here, —fiction”
→
“urban-fantasy” means “urban-fantasy”
is-a
“fiction”.