Interlinking Opensource Geo-Spatial Datasets for Optimal Utility in Ranking - Modern Trends in Cartography

Geography Reference

In-Depth Information

specifying the primary and alternate names of the place, the geometry and the

scalerank of the place (see Fig. 1 ). However, the names of places in the NE dataset

may not have perfectly matching counterparts in the OSM table. To include other

variations of names for the same place, we first join the Geonames dataset with

Natural Earth. This increases the possibility of a perfect string match of place

names with the OSM dataset. To perform this join, we use the

'

geonameid

'

column

present in both datasets.

For every record (place) found in NE for Czech Republic, we can scan the entire

OSM table to find a match based on the name of the place. This can have a problem

that the OSM table is huge and a search based on any non-indexed column will

delay the query. To solve this issue, we first query the OSM table for all nearby

places of a record fetched from the NE table based on the comparison of the

geometry fields of both datasets. To define nearby places, we set a search_range

variable to a distance, say for example, search_range

10,000 m. So, we fetch all

places from the OSM table that are within a distance of search_range from the

selected place in NE. The OSM table is indexed on its geometry column, hence the

query runs faster. We also limit the search of nearby places to

¼

city

,

town

and

'

from OSM dataset.

Now, we have a list of nearby places from the OSM table, of which ideally one

place should match the selected place from the NE table. We now perform a string

matching routine by comparing the selected NE place name with every nearby

place retrieved from OSM table, going nearest to farthest from the NE place. We

break at the first successful match. Once we find a perfect match, we terminate our

search for the current place and update the matching record in OSM table with the

scalerank value from NE place using the below query.

village

'

UPDATE planet_osm_point SET osm_scalerank ¼ 9 where

osm_id ¼ 1601566699;

During the string matching routine of a place, we compare all of the names retrieved

from NE as well as Geonames dataset with the OSM ' place ' column. The various NE

columns compared are ' name ' ,

' namealt ' ,

' nameascii ' ,

' meganame ' and ' namepar ' .

The Geonames column used here is

. We continue this process until we have

gone through all the places fetched from NE dataset for Czech Republic.

It is possible in a very few cases that a perfect string match is not found in the

OSM dataset for a place from NE. In this case we attempt string matching using a

fuzzy approach. For this approach we define the percentage of closeness to perfect

match. For example, the 2 strings must be 80 % close to perfect match to qualify as

the same place. If such a match is found, the scalerank value for the place in OSM is

updated with the value from NE. The intermediate utilization of GeoNames to

search alternate names of places of NE and OSM and match using partial, complete

or fuzzy matching and put it properly in the OSM table column, the query is below.

The string matching with GeoNames alt_names and using geoid + geom field of NE

and OSM yields much higher accuracy than just 80 % and is displayed in results

section next.

name

'

Modern Trends in Cartography

Search WWH ::

Custom Search

Home