Geography Reference
In-Depth Information
from different sources (Batini et al. 1986 ; Chen et al. 2003 ). The datasets should
overlap on both the subject and the described area. In a preprocessing filtering
stage, objects, that does not belong to the intersection between the datasets, are
eliminated. The studies assumed however, that the overlap (i.e. the number of world
entities that appear in both sets) between the datasets, after the filtering stage, was
not complete, i.e. there existed world entities which appeared only in one dataset
(Batini et al. 1986 ; Beeri et al. 2005 ; Safra and Doytsher 2006a ; Butenuth and
Heipke 2005 ). When geographic datasets get integrated, the main task has been to
identify when two or more objects, from the different sources, represent the same
entity and fuse those objects to a single object. Since each entity represented at most
one object in a dataset, a fusion set contained at most one object from each dataset
(Batini et al. 1986 ; Chen et al. 2003 ). Thus, the matching between objects from two
datasets was shown to be 1:1.
The papers (Masuyama 2006 ; Sagayaraj et al. 2006 ; Sripada et al. 2004 ) point
out the fact that integration of geo-spatial data from heterogeneous sources has
many important applications. One example they showed was combining up-to-date
data, maybe from a satellite image, with data from a map that contains verbal
descriptions of entities (Zhang et al. 2003 ; Ziegler and Dittrich 2004 ). When
geographical entities are represented in different sources, and each source stores
different properties of the entities, integration makes it possible to obtain all the
available information on each entity. Integration is essentially a join of datasets.
The main task in a join is to find all sets of corresponding objects, i.e., objects that
represent the same real-world entity in distinct sources (Papakonstantinou
et al. 1996 ; Park 2001 ; Safra and Doytsher 2006b ). Over heterogeneous sources,
however, finding corresponding objects is difficult, since there are no global
identifiers. In principle, both spatial and non-spatial properties may be used, in
lieu of global identifiers, for integrating geographical data. However, only location
is always available for spatial objects (Sattler et al. 2000 ; Walter and Fritsch 1999 ;
Devogele 2002 ). Since in many cases, locations uniquely identify objects in a
dataset, location-based join seems to be an easy task. This is not so, however, for
several reasons. First, measurements introduce errors, and the errors in different
datasets are independent of each other. Second, each organization has its own
approach and requirements. Hence, different organizations use different measure-
ment techniques and may record spatial properties of entities using a different scale
or a different structure (Friis-Christensen et al. 2005 ; GSDI 2005 ; Hampe
et al. 2004 ). For example, one organization might represent buildings as points,
while another could represent them as polygons. While an estimated point location
can be derived from a polygonal shape, it may not agree with a point location in
another database. A third reason could be displacements caused by cartographic
generalizations. For the above reasons, location-based joins do not provide a
precise answer, but rather an approximation. The quality of the approximation is
determined by characteristics of the joined datasets, such as sizes of the errors, the
density of objects, and the relative overlap. In this paper, we introduce algorithms
for location-based join of three or more sources, under the assumptions that
locations are given as points and each dataset has at most one object per real-
world entity. The rationale underlying all our algorithms is that even in the presence
Search WWH ::




Custom Search