Geography Reference
In-Depth Information
Columbia). Data were obtained from the Web site of the National Climatic Data Center
(www.ncdc.noaa.gov/oa/ncdc.html). The point attributes used included annual averages of:
(1) the numbers of days classified as cloudy, clear, or sunny;
(2) humidity;
(3) precipitation;
(4) snowfall;
(5) average, minimum, and maximum temperature;
(6) average and maximum wind speeds.
Note that the attribute data obtained consisted of only just over 200 points distributed across
the contiguous United States. Future studies will include a much larger point data set, thus
providing a better match between the granularity of these data and that of the block groups
and the high-resolution self-organizing map.
The methodology then called for interpolation of all attributes to continuous raster
grids, followed by a zonal average computed for each block group. This created unexpected
challenges to the creation of appropriate source data before SOM training could even occur.
Note that block groups represent a detailed tessellation of geographic space into areas of
varied shape and size. Each block group is an aggregation of several census blocks into
units containing around 1500 people, with a range of around 600-3000 people. Thus, block
groups in rural areas can be quite large, while urban block groups literally consist of only a
few city blocks. Given this potentially very small size, the interpolation method and method-
specific settings have to be carefully chosen. To that end, all attributes underwent a rigorous
process of cross-validation, where one point observation at a time is removed, interpolation
is performed and predicted and known values are compared. When this is done for all points,
a summary measure based on a root mean square error (RMS) can be computed. This was
performed for several dozen combinations of interpolators and settings. In all cases, some
variation of kriging produced the best result, though with different models (e.g. spherical,
Gaussian, etc.), reflecting different patterns of spatial variation for the various attributes.
The most difficult lessons learned in the preprocessing of SOM training data related to
the limits of current commercial off-the-shelf (COTS) GIS software in performing spatial
analysis on very large data sets. Throughout this process, ArcGIS 9.1 was used, including
the Geostatistical Analyst extension. Given the potentially very small size of block groups,
attributes were at first interpolated at a pixel resolution of 1 km 2 . Then, a zonal average was
attempted to be computed for each of the 200 000
+
block groups. However, even at a pixel
size of 1
1 km, standard zonal operators fail for a large number of urban block groups,
because they do not contain a single pixel centroid. This was circumnavigated by converting
pixel centroids to points, resulting in very large point files. There are numerous ways to then
implemented point-to-polygon transfer of zonal attributes, most of which work reasonably
well for small subsets, like for a single city or county, consisting of up to a few thousand
block groups. However, even for those subsets, overlay operations take a significant amount
of time. For larger data sets, execution would theoretically take several days, but overlays
×
Search WWH ::




Custom Search