Information Technology Reference
In-Depth Information
Figures 1(c) and 1(d) measure the depth of the validated ontology detected by
each algorithm for both
relationships (higher bars).
These measures quantify the richness of the ontology. If there are multiple paths
from the root to a node n , the depth is the longest path. Because the other
algorithms find just
is-a
(lower bars) and
any
any
relationship between elements in an ontology, rather
determining the types of relationships, like ONTECTAS does, we measure both
the
has-a
since no other algorithms detect it. Notice that this gives an advantage to the
competing algorithms. For the depth metrics, other algorithms usually find a
long chain with combination of synonyms and
is-a
relationships and
any
relationships found. We do not consider
is-a
relationships. Since ONTEC-
TAS detects mostly
is-a
or
has-a
(and not synonyms), maximum depth for
any
relationship in ONTECTAS is close to maximum depth of
is-a
relationship
because in general of chains containing
is-a
and
has-a
are rare.
relationships, ONTECTAS has the highest maximum depth for two
out of three datasets. In the full version of the paper [20], we show that the
average number of children is similar to the average depth. For the average
number of children, ONTECTAS has the best performance for CiteULike, is
roughly tied for Library thing, and is second best for Del.icio.us.
Even when competing algorithms are given credit for
For
is-a
any
relationships and
ONTECTAS only for finding
is-a
, ONTECTAS performs fairly well. This is
because there are so many
is-a
relationships detected as compared to the other
relationship types.
For all of the depth/children metrics, we note that all algorithms perform
markedly better using our preprocessing step of removing verb phrases .Thisstep
helped a lot in removing non-ontological tags such as “to-read” in the Del.icio.us
dataset. By applying this to all algorithms, we have improved all algorithms'
performance, not just ONTECTAS's. Figure 1 also shows that most of the
algorithms performed better on most measures for the Deli.icio.us and Library-
Thing datasets than on CiteULike. This validates the fact that the tags in these
datasets are of better quality than the ones in CiteULike. This shows that we
can compare different CTSs on the quality of tagging actions, using an ontology
creation algorithm.
In summary, ONTECTAS outperforms the four other algorithms on precision
and relative recall for
relationships, and does well on the structural metrics
of maximum depth, average depth and average number of children.
is-a
7.5 Comparing with a Gold Standard
Following [19], we compared how the algorithms extracted
relationships
against a “gold standard” ontology — the concept hierarchy from the Open
Directory Project (ODP) 6 . To judge precision, recall, and F-measure, we use
the lexical and taxonomic metrics from [19]. The lexical metrics measure how well
the algorithms did in recreating the concepts , and the taxonomic metrics show
how well the algorithms did in recreating the structure . Notice that comparing
with a static ontology considered as gold standard has its problems since it
is-a
6 http://dmoz.org
 
Search WWH ::




Custom Search