Information Technology Reference
In-Depth Information
algorithm. The
algorithm is similar, but requires that pattern 1 and
one of patterns 2 and 3 are above the threshold. Both thresholds were found
experimentally. In our experiments, the
has-a
is-a
threshold was 7 and
has-a
threshold
ranged from 20 to 50.
6 Exploiting Co-parents to Find More
is-a
Relationships
Examining the ontology built thus far reveals an interesting property when pairs
of tags share the same child. Consider the following example: the ontology may
contain “fiction
urban-fantasy”, where “fic-
tion” and “fantasy” are both parents for “urban-fantasy” w.r.t. the
urban-fantasy” and “fantasy
is-a
rela-
tionship. 5
relationship between “fiction” and “fantasy” may
be missing. One possible reason for this is that people tend to use the more spe-
cific tags leading to “fiction
However, the
is-a
urban-fantasy” and “fantasy
urban-fantasy”,
so that “fiction
fantasy” does not occur above the relatively high threshold
needed to avoid noise.
Hence we have the following hypothesis: in a co-parent structure it is more
likely than usual that the two parents are in an
relationship. Hence, we
include the following additional step (Algorithm 6) to ONTECTAS: for such co-
parent pairs, we re-examine the pair's confidences under a lower threshold and
extract candidate tuples for an
is-a
is-a
relationship.
Algorithm 6. Co Parent Pruning
Input: ( T ) A set of tuples with is-a relationships in form parentTag,childTag ;( F )
A set of frequent itemsets
Output: ( T ) An enhanced set of tuples with is-a relationships
1: T ← T
2: G ← A graph where each tuple in T corresponds to an edge from parentTag to
childT ag .
3: S ← All tuples of tags parent 1 ,parent 2 , child
s.t. (1) edge( parent 1 → child ) ∈ G and (2) edge( parent 2 → child ) ∈ G and
(3) edge( parent 1 → parent 2 ) ∈ G and (4) edge( parent 2 → parent 1 ) ∈ G .
4: for all parent 1 ,parent 2 , child∈S do
5: if {parent 1 ,parent 2 } is frequent and if it satisfies lower forward and reverse
confidence thresholds then
6: Add parent 1 ,parent 2 to T with the more frequent tag as the parent.
7: end if
8: end for
9: Return T
As a final step of the ONTECTAS algorithm, following standard practice in
ontology extraction algorithms, if the graph of relationships is disconnected, we
add a generic “Entity” root node and make it the parent of all orphan nodes.
5 Here, —fiction” “urban-fantasy” means “urban-fantasy” is-a “fiction”.
 
Search WWH ::




Custom Search