Information Technology Reference
In-Depth Information
In sum, in contrast to previous works on ontology extraction from CTSs, our
method is capable of detecting both
relationships and explicitly
identifying each. Our multi-stage algorithm also extracts high quality relation-
ships between multi-word tags.
has-a
and
is-a
3 Problem Statement
A collaborative tagging system [22] is a 4-tuple C =( U, T, I, Y )where U is a set
of users, T is the set of tags used by the users, I is the set of items (resources) to
which tags are assigned by users, and Y , the set of tag assignments, is a ternary
relation on tags, users, and items, i.e., Y
I .
Specific CTSs may vary in detail from our definition above, e.g., IMDb does
not have user information. We can model such CTSs by dropping U and defin-
ing Y
U
×
T
×
T
×
I as a binary relation. CTSs such as [11] allow users to declare
their own
relationships can augment those
automatically extracted but cannot supplant them because of the scale.
This paper studies how to eciently extract
is-a
relationships. User-supplied
is-a
is-a
and
has-a
relationships
between tags in a given CTS. The output ontology consists
tag1, tag2, label
. 2 E.g., the
tuples where tag1 is the super class and label is either
is-a
or
has-a
tuple
OS, Windows,
is-a
indicates that Windows a kind of OS.
4 Ontology Extraction from Collaborative TAgging
Systems (ONTECTAS) Algorithm
Algorithm 1. ONTECTAS
Input: (D) A set of item, tag 2-tuples or user, item, tag 3-tuples
Output: (O) Ontology of tags with is-a and has-a relationships
1: D Preprocess D. / * D is a set of item, tag tuples* /
2: T basic ,F←Association Rule Tuple Detection ( D )/ *Algorithm 2* /
3: T pruned ← Bigram Filtering ( T basic )
/ *Algorithm 3* /
4:
T headword ,O←Headword Detection ( T pruned )
/ *Algorithm 4* /
5:
O ← O ∪ is-a Relationship Detection
(
T headword ,
is-a
-patterns,
is-a − threshold )
/ *Algorithm 5* /
6:
O ← O ∪ has-a Relationship Detection
(
T headword ,
has-a
-patterns,
has-a − threshold )
7: T co parent ← Co Parent Pruning ( T headword ,F ) / *Algorithm 6* /
8: Return O ∪ is-a Relationship Detection ( T co parent , is-a − patterns,
is-a − threshold )
/ *Algorithm 5* /
Our ONTECTAS algorithm for ontology extraction (Algorithm 1) consists of six
phases. First, data is preprocessed and cleaned. Next, we extract candidate tag
tuples via association rule mining using forward and reverse confidence. We then
2 In both relationships tag2 is-a tag1 and tag1 has-a tag2, we refer to tag1 as the
super class label or the parent label for convenience, by abusing terminology.
 
Search WWH ::




Custom Search