What is the Proper Unit of Analysis in GIS?—Using Tessellations and Related Database Issues Part 1

In the maps we have discussed so far, the units within which data are displayed are defined by city boundaries, U.S. Census Bureau definitions of geography, such as tracts or block groups. There are many good reasons for using such geographic boundaries for geocoding and mapping. First, any address-based system of locating places in geography is based on the definitions each political system uses to identify places. Residences and business in the United States are grouped according to political and administrative units with defined boundaries, such as cities, towns, counties, states, and the national boundary as a whole. Our concepts of space are based on such divisions—when you meet someone, one of the first questions you ask is where are you from? The usual answer involves giving the name of a town, village, or city, and when the question is more specific, such as can I get your address so I can send you a package, the kinds of addresses you have used to geocode locations in this topic are in this form. Further, when you started to construct thematic maps, the boundaries used to summarize the data came from the Census Bureau’s definitions of space. All of these boundaries are arbitrary—some time in the past, somebody decided that the boundary of Riverside was going to be in this location, and then the Census Bureau officials used their rules for establishing the boundaries of census tracts and block groups, and so on. These constructions—the nature of which are independent of any consideration of what data you wish to study, understand, and use to inform decision making—were made in some cases many years ago and in a very arbitrary manner. Some of the spatial conflicts engaged in by nations we considered earlier came about because someone far away, and with little understanding of the geography and the social facts on the ground, drew an arbitrary line on a map.


However, the use of these arbitrary boundaries may be appropriate because they give us access to additional kinds of data that is useful for our research and our decision making. For example, in Figures 1.96 and 1.97, we saw how the number of divorces may have a relationship with the number of juvenile crimes in a block group—because we coded the juvenile crimes into block groups, this gave us access to Census Bureau data on the number of divorces, so we could ask the question, Is there a relationship? It makes sense to ask such a question, as one factor in juvenile delinquency may be the relationship a youth has with his or her parents, and in the case of divorce this relationship may be stressed and strained, particularly as a single parent usually has custody of a child after a divorce, and that parent may be working and commuting long hours, and is unable to supervise their child as much as in a two-parent household. The lack of supervision is the real factor that may result in higher rates of juvenile crime, but the number of divorced individuals in the area may be a proxy for the lack of supervision.

In general there are many possibilities to look for relationships among variables of interest and the more standard units are used, the more likely we are to have access to a number of potential indicators we are interested in. However, this is not the only way to group data together on a map. In some ways the arbitrary nature of these units creates difficulties for research into the geospatial relationships that are present in these types of data. This will become more apparent in Section 3, when we discuss the ins and outs of spatial statistical modeling and its uses. However, think about a street gang in a neighborhood, committing violent crime. These youths will hardly notice or even be aware of the fact that the route they usually walk from their homes to the local convenience store involves crossing a block group boundary—in fact they are likely to believe and act as if the entire area around their homes and the store are part of their "territory" and they may act on this feeling by assaulting any other youths who come into this area. The violent crimes that result would be geocoded into several block groups, and we will be splitting their behavior into discrete units when they are part of a larger whole. This fact can introduce bias into any analysis we want to do of these data and the relationships we want to examine. In Section 3 you will learn how to compensate for such features of geographic-based behavior in your analyses, but this example illustrates the problem of using artificial boundaries.

What is the alternative? It is possible to group data linked to physical space together in units that make sense given the distribution across space of the occurrences of the behavior of interest. So, for example, we could redraw the pin map in Figure 1.31 to reflect the distribution of juvenile violence without regard to the block group boundaries. The resulting type of groupings are called "Tessellations," referring to the tile-like pattern such units often display once they have been identified and mapped. Such techniques can be used to form units for analysis of spatially clustered databases on the physical location of events independent of the location of arbitrary census or political/administrative boundaries.

One of the disadvantages of using such an approach is the relative lack of data from other sources to relate the pattern of events found across the space being studied. Although some researchers have used algorithms to address this problem, producing weighted summaries of existing boundary-based data for the newly formed tessellations, these algorithms all contain untestable assumptions about the distribution of these other variables in the space, and thus cannot be recommended. However, some researchers believe that the advantages of constructing units based on the actual distribution in space of the events and their locations far outweigh any disadvantages in the measurement of other factors that might be related to these events and their spatial distribution.

Within the contexts of the types of maps and GIS techniques within the scope of this handbook, we can explore some of the implications of these boundary issues by using clustering techniques to modify our standard mapping approach. First, we will see what can be learned about our data and its distribution in space, and then we will demonstrate how these techniques can be used.

The data on youth violence and police contact we have used in several examples previously will be useful once again for this discussion. In Figure 2.31, a thematic map based on the number of youth violent events is displayed. Looking at this map in light of the discussion here, we can start to see ways in which the block groups, arbitrarily drawn from the point of view of our data on youth violence (but drawn according to U.S. Census Bureau algorithms for use in the Census of the Population), might be combined based on similarities in the distribution of events, that is the way events cluster in space. If we add the pin map to the thematic map, remembering that some of the points displayed on the pin map represent multiple events, this clustering becomes even more apparent (Figure 2.32.

FIGURE 2.31 Number of youth violent incidents, Riverside

Number of youth violent incidents, Riverside

FIGURE 2.32 Pin and thematic map number of youth violent incidents

Pin and thematic map number of youth violent incidents

A detailed view of this map shows the potential for redrawing the boundaries of units with similar patterns of occurrence of these events in space that do not necessarily correspond to the block group boundaries. For example, there seem to be two relatively tight clusters of events in the detailed view shown in Figure 2.33 that cross several block group boundaries.

FIGURE 2.33 Clustering of events across arbitrary boundaries

Clustering of events across arbitrary boundaries

The cluster of events in the middle of the map in the darker and light gray (orange and yellow) block groups form one cluster, tightly grouped, crossing several boundaries, but with some relatively blank space around the cluster. A similar pattern appears to the left of the map, in the group of block groups with a black (red) group in the center, surrounded by darker gray (orange) shaded units and some light gray (yellow) to the far left. You can also see the effect of boundaries, with lines of events strung out on the streets that are used in many cases to form the boundary of the block group. Is there a way to reconsider these units, or at least to combine block groups based on the similarity in the number and distribution of events?

One way to address this question is to use a clustering approach based on the spatial distribution of events within the block groups compared to that within the surrounding block groups. If the distribution and number of events in a block group is similar to that in a neighboring block group, it could be argued that the boundary between the two groups is arbitrary with regard to the events and where they happen. Thus it could make sense from this point of view to eliminate the boundary between the two units or block groups and form a larger "tessellation" of these two units. This process could be continued so that each block group or unit is linked with neighboring units that have similar patterns or clusters of events, such that large groups of units could be merged to make units that reflect the underlying distribution of these events.

Next post:

Previous post: