Global Positioning System Reference
In-Depth Information
same, while the score 0 is possible when the closest common concept is
the root node.
A different measure based on semantic distance and depths was
proposed by Leacock and Chodorow (1998). According to this approach, the
similarity between two concepts E 1 and E 2 is given by the number of nodes
along the shortest path between them, divided by double the maximum
depth (from the lowest node to the root) in the taxonomy in which E 1 and E 2
occur as follows: -log ( D node ( E 1 , E 2 )/(2 * D )), where D is the maximum depth
of the taxonomy. Hence, the number of nodes between two siblings, i.e.,
two nodes with the same parent node, is three. For instance, the Leacock
and Chodorow measure of similarity between River and Aqueduct is -log
(5/(2 * 6)) = 0.38.
The main limitation of node/edge counting approaches is related to the
underlying assumption for defi ning a taxonomy, according to which the
distances of adjacent nodes in each level are equivalent. In fact, according
to these approaches for instance the distances of nodes Seasonal_river ,
Dry_riverbed and Perennial_river from the root node Water_system coincide
and are equal to 4 in the case of edge counting and 5 in the case of node
counting. Similarly, their distances from the node River_system are 2 and 3
in edge and node counting approaches, respectively.
In the 1990s, a different approach, namely information content (or node-
based) approach has been introduced (Resnik 1995; 1999), which has been
successively refi ned by Lin (1998). Essentially, it relies on the association
of probabilities with the nodes of the taxonomy. The similarity between
concepts is measured by the ratio between the amount of information shared
by the concepts and the sum of the amounts of information of concepts.
This approach is recalled in the Similarity Methods section. With respect to
other existing proposals, the Lin approach shows a higher correlation with
human judgment as has been discussed in Jiang and Conrath (1997).
However, feature-based (or tuple) similarity models are the most
prominent and one of the key approach is the Dice 's function (Maarek et
al. 1991; Rasmussen 1992; Castano et al. 1998). It provides the coeffi cient
of correlation between feature vectors, and it is given by the ratio between
the number of features that are common to two vectors and the sum of the
numbers of the features of each vector. Although, Dice's function is the most
commonly used approach, it does not allow tuple similarity to be computed
by explicitly considering the similarity degrees of components. However,
the similarity degrees of components through the notion of information
content similarity can be addressed and this approach will be recalled in
the Similarity Methods section.
The research in the fi eld of GIScience has been aligned with the advent
of studies on similarity in the literature. Similarity has a long tradition in
GIScience, and has been mainly discussed from the spatial point of view. In
Search WWH ::




Custom Search