Information Technology Reference
In-Depth Information
Section 2 discusses related work. Section 3 formalizes the problem studied in this
paper. Section 8 concludes and discusses future work.
2 Related Work
Some other works have studied extracting ontologies from CTSs. Some ap-
proaches [16,18,2] match CTS tags to concepts in general purpose ontologies
such as WordNet, resulting in a graph of tags. However, because CTSs are ad-
hoc and use terms dynamically, general purpose ontologies miss many terms
as well as edges (i.e., relationships). For example, our experiments show that
WordNet misses more than 25% of correct edges between concepts extracted
from Del.icio.us, even when both parent and child concepts are in WordNet .
Schmitz [23] constructs weighted graphs based on conditional probabilities
between pairs of tags. His algorithm cannot identify the exact relationship (e.g.,
is-a
) between terms — it simply says they are related, not how. By
contrast, our algorithm pinpoints
and
has-a
relationships between terms.
Heymann and Garcia-Molina [10] create an ontology by vectorizing the tags
and finding the cosine similarity between tags. However, their method puts every
tag from the similarity matrix into the taxonomy which causes many erroneous
edges. Their work lacks an evaluation.
Schmitz et al. [22] use association rule mining to build a tree of related tags
from a CTS; however, they do not explain how the edges are built or what types
of relationships they model. We explain this in depth and also use lexico-syntactic
patterns and a search engine to detect accurate
is-a
and
has-a
relationships.
[24] extends [22] and [10] by considering the tag's context. Barla and Bielikova[3]
consider tag context similarly to [24].
The DAG algorithm [5] distinguishes between subjective and objective tags.
After calculating feature vectors for each objective tag, DAG places tags with
higher entropy in higher levels of abstraction. Like many other previous works,
DAG does not determine the type of relationship between concepts.
Lin et al. [19] build a subsumption graph from the folksonomy and use a
random walk to sort tags by generality ranking. They put tags in the taxonomy
based on support and confidence between candidate nodes from the graph. They
only consider a single sense for each tag, which leads to missed relationships. The
authors claim building transactions for tags associated to items by specific users
will lead to the best taxonomy because it preserves most of the information. In
contrast, we found that user information does not improve taxonomy quality.
Korner et al. [14] categorize users by the kind of tags they use. They show that
excluding some users can reduce noise and improve precision. This improvement
is orthogonal to the contribution we make in this paper and is applicable in our
context as well. We leave adapting ONTECTAS to this as future work.
Hearst [9] defines a set of patterns that indicate
is-a
and
has-a
relationships between
words in text documents. [4,6] find patterns for detecting
is-a
relationships
from text corpora. To our knowledge, our work is the first to extend the lexico-
syntactic patterns to find relationships of any type between tags in CTSs.
has-a
Search WWH ::




Custom Search