ONTECTAS: Bridging the Gap between Collaborative Tagging Systems and Structured Data - Advanced Information Systems Engineering

Information Technology Reference

In-Depth Information

Section 2 discusses related work. Section 3 formalizes the problem studied in this

paper. Section 8 concludes and discusses future work.

2 Related Work

Some other works have studied extracting ontologies from CTSs. Some ap-

proaches [16,18,2] match CTS tags to concepts in general purpose ontologies

such as WordNet, resulting in a graph of tags. However, because CTSs are ad-

hoc and use terms dynamically, general purpose ontologies miss many terms

as well as edges (i.e., relationships). For example, our experiments show that

WordNet misses more than 25% of correct edges between concepts extracted

from Del.icio.us, even when both parent and child concepts are in WordNet .

Schmitz [23] constructs weighted graphs based on conditional probabilities

between pairs of tags. His algorithm cannot identify the exact relationship (e.g.,

is-a

) between terms — it simply says they are related, not how. By

contrast, our algorithm pinpoints

and

has-a

relationships between terms.

Heymann and Garcia-Molina [10] create an ontology by vectorizing the tags

and finding the cosine similarity between tags. However, their method puts every

tag from the similarity matrix into the taxonomy which causes many erroneous

edges. Their work lacks an evaluation.

Schmitz et al. [22] use association rule mining to build a tree of related tags

from a CTS; however, they do not explain how the edges are built or what types

of relationships they model. We explain this in depth and also use lexico-syntactic

patterns and a search engine to detect accurate

is-a

and

has-a

relationships.

[24] extends [22] and [10] by considering the tag's context. Barla and Bielikova[3]

consider tag context similarly to [24].

The DAG algorithm [5] distinguishes between subjective and objective tags.

After calculating feature vectors for each objective tag, DAG places tags with

higher entropy in higher levels of abstraction. Like many other previous works,

DAG does not determine the type of relationship between concepts.

Lin et al. [19] build a subsumption graph from the folksonomy and use a

random walk to sort tags by generality ranking. They put tags in the taxonomy

based on support and confidence between candidate nodes from the graph. They

only consider a single sense for each tag, which leads to missed relationships. The

authors claim building transactions for tags associated to items by specific users

will lead to the best taxonomy because it preserves most of the information. In

contrast, we found that user information does not improve taxonomy quality.

Korner et al. [14] categorize users by the kind of tags they use. They show that

excluding some users can reduce noise and improve precision. This improvement

is orthogonal to the contribution we make in this paper and is applicable in our

context as well. We leave adapting ONTECTAS to this as future work.

Hearst [9] defines a set of patterns that indicate

is-a

and

has-a

relationships between

words in text documents. [4,6] find patterns for detecting

is-a

relationships

from text corpora. To our knowledge, our work is the first to extend the lexico-

syntactic patterns to find relationships of any type between tags in CTSs.

has-a

Advanced Information Systems Engineering

Search WWH ::

Custom Search

Home