Automatic Evaluation of Ontologies - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

be a large disagreement. We use the formula for OntoRandIdx ( U, V ) given above,

where the functions δ U and δ V take this intuition into account. That is, rather than

returning merely 1 or 0 depending on whether the given two clusters are the same

or not, the functions δ U and δ V should return a real number from the range [0 , 1],

expressing a measure of how closely related the two clusters are.

By plugging in various definitions of the functions δ U and δ V , we can obtain a

family of similarity measures for ontologies, suitable for comparing an ontology with

the gold standard in the context of the task that has been discussed in Section 11.4.1.

We propose two concrete families of δ U and δ V . Since the definitions of δ U and δ V

will always be analogous to each other and differ only in the fact that each applies

to a different ontology, we refer only to the δ U function in the following discussion.

Similarity Based on Common Ancestors

One possibility is inspired by the approach that is sometimes used to evaluate the

performance of classification models for classification in hierarchies (see, e.g., [19]),

and that could incidentally also be useful in the context of e.g., evaluating an auto-

matic ontology population system. Given a concept U i in the ontology U ,let A ( U i )

be the set of all ancestors of this concept, i.e., all concepts on the path from the

root to U i (including U i itself). If two concepts U i and U j have a common parent,

the sets A ( U i ) and A ( U j ) will have a large intersection; on the other hand, if they

have no common parent except the root, the intersection of A ( U i ) and A ( U j ) will

contain only the root concept. Thus the size of the intersection can be taken as a

measure of how closely related the two concepts are.

δ U ( U i ,U j )= |A ( U i ) ∩ A ( U j ) |/|A ( U i ) ∪ A ( U j ) |.

(11.2)

This measure (also known as the Jaccard coe cient) has the additional nice char-

acteristic that it can be extended to cases where U is not a tree but an arbitrary

directed acyclic graph. If the arrows in this graph point from parents to children,

the set A ( U i ) is simply the set of all nodes from which U is reachable.

Similarity Based on Distance in the Tree

An alternative way to define a suitable function δ U would be to work directly with

the distances between U i and U j in the tree U . In this case, let l be the distance

between U i and U j in the tree (length of the path from U i to the common ancestor

of U i and U j , and thence down to U j ), and h be the depth of the deepest common

ancestor of U i and U j .If l is large, this is a sign that U i and U j are not very closely

related; similarly, if h is small, this is a sign that U i and U j don't have any common

ancestors except very general concepts close to the root, and therefore U i and U j

aren't very closely related. There are various ways of taking these intuitions into

account in a formula for δ U as a function of l and h . For example, Rada et al. [23]

have proposed a distance measure of the form:

δ ( l, h )= e −αl tanh( βh )

(11.3)

Here, α and β are nonnegative constants, and tanh is the hyperbolic tangent

tanh( x )=( e x

− e −x ) / ( e x + e −x )=1 − 2 / (1 + e 2 x ) .

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home