Interlinking Opensource Geo-Spatial Datasets for Optimal Utility in Ranking - Modern Trends in Cartography

Geography Reference

In-Depth Information

from different sources (Batini et al. 1986 ; Chen et al. 2003 ). The datasets should

overlap on both the subject and the described area. In a preprocessing filtering

stage, objects, that does not belong to the intersection between the datasets, are

eliminated. The studies assumed however, that the overlap (i.e. the number of world

entities that appear in both sets) between the datasets, after the filtering stage, was

not complete, i.e. there existed world entities which appeared only in one dataset

(Batini et al. 1986 ; Beeri et al. 2005 ; Safra and Doytsher 2006a ; Butenuth and

Heipke 2005 ). When geographic datasets get integrated, the main task has been to

identify when two or more objects, from the different sources, represent the same

entity and fuse those objects to a single object. Since each entity represented at most

one object in a dataset, a fusion set contained at most one object from each dataset

(Batini et al. 1986 ; Chen et al. 2003 ). Thus, the matching between objects from two

datasets was shown to be 1:1.

The papers (Masuyama 2006 ; Sagayaraj et al. 2006 ; Sripada et al. 2004 ) point

out the fact that integration of geo-spatial data from heterogeneous sources has

many important applications. One example they showed was combining up-to-date

data, maybe from a satellite image, with data from a map that contains verbal

descriptions of entities (Zhang et al. 2003 ; Ziegler and Dittrich 2004 ). When

geographical entities are represented in different sources, and each source stores

different properties of the entities, integration makes it possible to obtain all the

available information on each entity. Integration is essentially a join of datasets.

The main task in a join is to find all sets of corresponding objects, i.e., objects that

represent the same real-world entity in distinct sources (Papakonstantinou

et al. 1996 ; Park 2001 ; Safra and Doytsher 2006b ). Over heterogeneous sources,

however, finding corresponding objects is difficult, since there are no global

identifiers. In principle, both spatial and non-spatial properties may be used, in

lieu of global identifiers, for integrating geographical data. However, only location

is always available for spatial objects (Sattler et al. 2000 ; Walter and Fritsch 1999 ;

Devogele 2002 ). Since in many cases, locations uniquely identify objects in a

dataset, location-based join seems to be an easy task. This is not so, however, for

several reasons. First, measurements introduce errors, and the errors in different

datasets are independent of each other. Second, each organization has its own

approach and requirements. Hence, different organizations use different measure-

ment techniques and may record spatial properties of entities using a different scale

or a different structure (Friis-Christensen et al. 2005 ; GSDI 2005 ; Hampe

et al. 2004 ). For example, one organization might represent buildings as points,

while another could represent them as polygons. While an estimated point location

can be derived from a polygonal shape, it may not agree with a point location in

another database. A third reason could be displacements caused by cartographic

generalizations. For the above reasons, location-based joins do not provide a

precise answer, but rather an approximation. The quality of the approximation is

determined by characteristics of the joined datasets, such as sizes of the errors, the

density of objects, and the relative overlap. In this paper, we introduce algorithms

for location-based join of three or more sources, under the assumptions that

locations are given as points and each dataset has at most one object per real-

world entity. The rationale underlying all our algorithms is that even in the presence

Modern Trends in Cartography

Search WWH ::

Custom Search

Home