Geography Reference
In-Depth Information
specifying the primary and alternate names of the place, the geometry and the
scalerank of the place (see Fig. 1 ). However, the names of places in the NE dataset
may not have perfectly matching counterparts in the OSM table. To include other
variations of names for the same place, we first join the Geonames dataset with
Natural Earth. This increases the possibility of a perfect string match of place
names with the OSM dataset. To perform this join, we use the
'
geonameid
'
column
present in both datasets.
For every record (place) found in NE for Czech Republic, we can scan the entire
OSM table to find a match based on the name of the place. This can have a problem
that the OSM table is huge and a search based on any non-indexed column will
delay the query. To solve this issue, we first query the OSM table for all nearby
places of a record fetched from the NE table based on the comparison of the
geometry fields of both datasets. To define nearby places, we set a search_range
variable to a distance, say for example, search_range
10,000 m. So, we fetch all
places from the OSM table that are within a distance of search_range from the
selected place in NE. The OSM table is indexed on its geometry column, hence the
query runs faster. We also limit the search of nearby places to
¼
city
,
town
and
'
'
'
'
from OSM dataset.
Now, we have a list of nearby places from the OSM table, of which ideally one
place should match the selected place from the NE table. We now perform a string
matching routine by comparing the selected NE place name with every nearby
place retrieved from OSM table, going nearest to farthest from the NE place. We
break at the first successful match. Once we find a perfect match, we terminate our
search for the current place and update the matching record in OSM table with the
scalerank value from NE place using the below query.
village
'
'
UPDATE planet_osm_point SET osm_scalerank ¼ 9 where
osm_id ¼ 1601566699;
During the string matching routine of a place, we compare all of the names retrieved
from NE as well as Geonames dataset with the OSM ' place ' column. The various NE
columns compared are ' name ' ,
' namealt ' ,
' nameascii ' ,
' meganame ' and ' namepar ' .
The Geonames column used here is
. We continue this process until we have
gone through all the places fetched from NE dataset for Czech Republic.
It is possible in a very few cases that a perfect string match is not found in the
OSM dataset for a place from NE. In this case we attempt string matching using a
fuzzy approach. For this approach we define the percentage of closeness to perfect
match. For example, the 2 strings must be 80 % close to perfect match to qualify as
the same place. If such a match is found, the scalerank value for the place in OSM is
updated with the value from NE. The intermediate utilization of GeoNames to
search alternate names of places of NE and OSM and match using partial, complete
or fuzzy matching and put it properly in the OSM table column, the query is below.
The string matching with GeoNames alt_names and using geoid + geom field of NE
and OSM yields much higher accuracy than just 80 % and is displayed in results
section next.
name
'
'
Search WWH ::




Custom Search