ONTECTAS: Bridging the Gap between Collaborative Tagging Systems and Structured Data - Advanced Information Systems Engineering

Information Technology Reference

In-Depth Information

Figures 1(c) and 1(d) measure the depth of the validated ontology detected by

each algorithm for both

relationships (higher bars).

These measures quantify the richness of the ontology. If there are multiple paths

from the root to a node n , the depth is the longest path. Because the other

algorithms find just

is-a

(lower bars) and

any

relationship between elements in an ontology, rather

determining the types of relationships, like ONTECTAS does, we measure both

the

has-a

since no other algorithms detect it. Notice that this gives an advantage to the

competing algorithms. For the depth metrics, other algorithms usually find a

long chain with combination of synonyms and

is-a

relationships and

any

relationships found. We do not consider

is-a

relationships. Since ONTEC-

TAS detects mostly

is-a

or

has-a

(and not synonyms), maximum depth for

any

relationship in ONTECTAS is close to maximum depth of

is-a

relationship

because in general of chains containing

is-a

and

has-a

are rare.

relationships, ONTECTAS has the highest maximum depth for two

out of three datasets. In the full version of the paper [20], we show that the

average number of children is similar to the average depth. For the average

number of children, ONTECTAS has the best performance for CiteULike, is

roughly tied for Library thing, and is second best for Del.icio.us.

Even when competing algorithms are given credit for

For

is-a

any

relationships and

ONTECTAS only for finding

is-a

, ONTECTAS performs fairly well. This is

because there are so many

is-a

relationships detected as compared to the other

relationship types.

For all of the depth/children metrics, we note that all algorithms perform

markedly better using our preprocessing step of removing verb phrases .Thisstep

helped a lot in removing non-ontological tags such as “to-read” in the Del.icio.us

dataset. By applying this to all algorithms, we have improved all algorithms'

performance, not just ONTECTAS's. Figure 1 also shows that most of the

algorithms performed better on most measures for the Deli.icio.us and Library-

Thing datasets than on CiteULike. This validates the fact that the tags in these

datasets are of better quality than the ones in CiteULike. This shows that we

can compare different CTSs on the quality of tagging actions, using an ontology

creation algorithm.

In summary, ONTECTAS outperforms the four other algorithms on precision

and relative recall for

relationships, and does well on the structural metrics

of maximum depth, average depth and average number of children.

is-a

7.5 Comparing with a Gold Standard

Following [19], we compared how the algorithms extracted

relationships

against a “gold standard” ontology — the concept hierarchy from the Open

Directory Project (ODP) 6 . To judge precision, recall, and F-measure, we use

the lexical and taxonomic metrics from [19]. The lexical metrics measure how well

the algorithms did in recreating the concepts , and the taxonomic metrics show

how well the algorithms did in recreating the structure . Notice that comparing

with a static ontology considered as gold standard has its problems since it

is-a

6 http://dmoz.org

Advanced Information Systems Engineering

Search WWH ::

Custom Search

Home