Geospatial Modeling and GIS

Geospatial Modeling and GIS

This section is different in nature and purpose than the previous two, which were designed to introduce, at the most basic level especially in Section 1, social science students and researchers from sociology, criminology, political science, economics, public health, and other social and behavioral sciences to GIS and map making for dissemination, research, and policy decision making. This section takes on a much more advanced topic—spatial modeling—and our goal is to try and use the same philosophical approach that we took in Sections 1 and 2. We hope to explain this complex topic in an accessible manner and to provide examples and step-by-step instructions both to illustrate the workings of software to conduct spatial analysis and to demystify the process for the first-time user. However, as the topic is more complex, we expect that most undergraduate students will have difficultly moving directly on to this section. Spatial modeling is a topic that can be seen in one way as an extension of regression analysis, a mainstay of quantitative social science and policy research in many disciplines. A reader without a basic course in regression analysis and/or significant experience as a researcher using regression analyses would find it very difficult to take up this section, however gallant our effort to simplify and explicate this topic. So why include this topic here at all?

Spatial modeling is the next logical step from the end of Section 2, where we discuss making and using multivariate complex maps to support analytic thinking, test hypotheses and relationships in space and time, and support policy decision making in the complex realities that communities and people exist in. The maps you made in Section 2, as complex and interesting as they are, have limits, and the way to approach those limits and to overcome them is via spatial modeling. One of the last examples in Section 2, concerning immigration and unemployment in the United States, which examines the relationship between these variables, is charged with political, economic, and social controversy. Using maps and three variables, we were able to suggest that a common assumption that immigration leads to higher unemployment may be spurious and should not be used to make immigration policy. However, how many more variables could we explore using a map-based framework and be able to make any sense out of what we were seeing? This section on spatial modeling is included here for the students, researchers, scholars, or just plain readers of this topic who want to go further, to take the next step to be able to explore social contexts and the spatial realities around them with the most appropriate tools and with the confidence that their inferences and decisions are the best they can be substantively and scientifically.


Before moving directly between Sections 2 and 3, the reader should have regression experience or be able to get through a text on basic multivariate regression such as Allison’s Multiple Regression (1999) or Schroeder et al.’s Understanding Regression Analysis (1986).

So far in this topic you have seen the way mapping information, even complex and multiple pieces of information, is possible and can be very effective at showing relationships and changes across time and space. Even these maps start to have their limits, however, and the conclusions drawn from them can be suspect, because the social, political, economic, and interpersonal worlds that our data are taken from are complex and multifaceted places. A number of processes are occurring simultaneously in the real world, and just because we can isolate a few of these on a map and show what appears to be a compelling story about, for example, a relationship between crime and alcohol availability, or the lack of a relationship between HIV infection rates and national rivalries, does not mean that we have fully explored the entire realm of possibilities. This is not to say that the maps are not a powerful means with which to support decision making, make policy, and test relationships, at least, that is, in all the examples we have discussed thus far. The realities we need to deal with to be effective at the tasks GIS can help us with call for a greater complexity to the approach we take to gain understanding about how these complexities operate. This section is about how to model the multivariate complexities of the geospatial realities we have already shown you how to construct using GIS. As such this section on spatial modeling is a natural extension of the previous section in which you learned to make complex and multifaceted maps to examine real world problems and relationships.

The situation with regard to GIS is similar to developments over the past 50 years in most social and behavioral science disciplines. In the 1950s, sociology was dominated by research that examined a few important factors at a time, for example looking at the link between education, income, and occupational prestige (Duncan, 1961). During the 1970s, sociologists began to use complex regression and log-linear models with 10 or 20 variables to model occupational prestige, income, and status attainment (see Sorenson, 1979). By the end of the twentieth century, sociologists were regularly using multilevel models and structural equation models, approaches that involve modeling systems of equations to more fully understand the complex sociological realities they study (Hagan et al., 1996; Wheaton and Clarke, 2003). This does not undermine the validity of qualitative research (see Burawoy, 2003) in any way, but simply reflects recognition of the complexity of sociological phenomena, and the desire on the part of sociologists who do quantitative modeling to have at their disposal the best models available for handling that complexity.

It has been argued that the maps we have discussed thus far in this topic are the booby prize of GIS. By this we do not mean to suggest that the maps are unimportant or not worth the considerable efforts, as you have seen by following the examples, that it takes to plan, construct, and interpret them. However, the maps are limited in terms of their ability to test properly specified hypotheses, advance theoretical models, or evaluate interventions. Instead, this line of reasoning suggests that the real power of GIS is as a method for linking geographic and spatial data with non-spatial data that has geospatial roots or links, so that databases can be constructed that unify these often disparate sources and types of data. These more complex databases allow us to address more interesting, more realistic, and more complex questions, but this additional complexity means that we must go beyond the capacity of even the most strategically constructed multi-variate map in order to model and understand what these complex databases can tell us about our world. Like the developments in methodology in many behavioral and social sciences, we therefore need to embrace the complexity our data can give us and use more complex analytic approaches to understand that complexity.

This argument also does not undermine the value of the maps we have discussed thus far in this topic. The well-conceived and displayed map, as we have already seen, can be a powerful way to convey important and complex information. Suppose that together with the map, we were able to buttress the conclusions of the map with a complex, multivariate geospatial statistical model using the same data as on the map and additional data from the GIS database constructed for the purpose of making the map, a model that supported the conclusions implied by the map? This would be an ideal situation to be in, and is analogous to the way sociologists and other social science researchers will often present correlations or tables early in an analysis, and then use a multivariate model to see if the basic relationships of interest hold up in the face of additional controls and alternative hypotheses expressed in the statistical model; when the original and theoretically implied relationship holds up in the context of a complex model with the appropriate rival hypotheses and statistical controls accounted for, this is a very powerful endorsement of the theoretical model and the importance for understanding the phenomena under study of the observed and verified relationship. The same is true of GIS and mapping; if we can also use geospatial modeling techniques to verify what the map is saying, we are in a much stronger position to advocate for the conclusions the map suggests, the policy it recommends, or against the intervention the map shows to be ineffective or wrongheaded. However, this will not always be the case.

Consider the chart in Figure 3.1, which is based on a gang intervention program in Riverside, California. Project Bridge was one of the original sites in Irving Spergel’s National Demonstration Project of the Comprehensive Model for Gang Prevention, Intervention, and Suppression, developed by Spergel based on his long experience as a gang interventionist and researcher (see Spergel, 1999 or Klein and Maxson, 2006).

FIGURE 3.1 Outcome results from Project Bridge, Riverside, CA, 1994-2000

Outcome results from Project Bridge, Riverside, CA, 1994-2000

The interventions in each site were tailored to the local conditions, so that in Riverside the intervention involved targeting the intervention in two neighborhoods that the local people involved in the project believed to be the most seriously gang-encumbered areas in the city. A third area, also having significant gang involvement, was selected as a control area, where no interventions from the project were implemented. In this way, the researchers evaluating the intervention could compare its impact in two "treatment" areas, to use a public health-style term, and to also make the comparison between the two treatment areas and a "no treatment" area or a comparison neighborhood. This is a stronger design than just having one implementation and one no treatment area, as you have two additional comparisons to make; in a sense, these represent alternative trials of the intervention and its impact. You can now also compare the two intervention areas to each other, and if they differ in the outcome measure, in this case youth violence rates, you can study the differences in implementation and starting or background conditions to seek an explanation of the differences. In a way you have a replication of the intervention study within the same site. The replication is not independent, but provides an additional test of the impact of the intervention. In addition, you can compare both interventions to the control or no treatment area, together and independently.

Examining Figure 3.1, you can see that the intervention seems to have worked in both treatment areas, although the two areas, East Side (line with the squares), and Arlanza (line with circles), responded somewhat differently over the seven time points represented here. The Arlanza area seemed to have a more effective intervention sooner, and youth violence started to drop steeply in the first year of the implementation, 1995. The East Side had a less smooth start to the intervention, and rates actually increased before they started to drop. Both areas declined significantly (t statistic equals 3.17 and is significant, 0.05, in a comparison between each treatment area and the no treatment area (Casa Blanca, the line with triangles)), although the East Side began to increase somewhat in 1999 and 2000.

The problem with these interpretations is that there may be other factors that could explain why these youth violence rates dropped the way they did over this time period in Riverside. For example, suppose that the three areas differ in terms of ethnic or racial composition, and that during this period one group had significant decreases in gang activity in one of the areas; then the intervention could have had nothing to do with this. Yet if we look at the graph in Figure 3.1, or the map in Figure 3.2, which gives the neighborhoods and their youth violence rates at the baseline year for Project Bridge, 1994, we risk making a mistake in our inference about why the results are coming out the way they did. Perhaps there were other changes in other variables that could explain the changes we see, and unless we can assess those factors, we cannot be certain that our conclusion about the success or failure of Project Bridge is correct. Perhaps these areas differ in terms of some other major predictors of youth violence, like poverty, family structure, or opportunity structure—all factors that have a long history of explaining changes in youth violence.So although the map can tell us a lot, we need more complex modeling techniques to fully understand the outcomes of an intervention like Project Bridge, which we will examine later in this section. If we were just shown a set of maps and a graph about Casa Blanca, we would conclude that an intervention must have caused the downturn in youth violence, but there was in fact no intervention there in this project. The fact that youth violence rates were declining across the country during this period (Bratton, 1998) helps us to understand the decline in Casa Blanca; knowing this makes the even sharper declines in the treatment areas all the more impressive, although we still do not know if we can attribute this to Project Bridge or not.

FIGURE 3.2 Youth violence in the three areas of Project Bridge, 1994

Youth violence in the three areas of Project Bridge, 1994

A second notion that underscores the importance of spatial modeling in GIS is the potential that such statistical models have for explaining why the relationships we see in the map are occurring the way they do. It is one thing to verify that the relationship suggested by the map is still true when we have controlled for all the other possible influences that might exist, but the question of why such a relationship exists in space across the community or the nation or the world is another, albeit extremely important, question. We need to know why things are the way they are if we can ever hope to successfully intervene to change some of the things we can observe with the maps. We have seen maps of potential racial profiling in traffic stops, and we have seen relationships between alcohol availability and crime, and these maps imply that we can intervene to reduce or prevent such negative outcomes as racial bias and discrimination, or gang-related youth violence. If we cannot bring some good evidence to bear on why these relationships exist in our community, we could intervene in such a way as to increase rather than decrease the likelihood that the problem will get worse rather than better. In addition, suppose we decide to try an intervention, the question then becomes, How can we tell if the intervention is having any impact? Once again we can map the intervention and the results over time to see if they appear to be connected, but if we can evaluate the intervention in terms of its impact on the outcome in a spatial model, we will not only be able to see if the intervention is effective but also to understand why it is effective. These arguments together show the importance of spatial modeling in the overall context of GIS as a research method and as a mechanism to support policy and decision making.

In this section we will introduce a basic approach to spatial modeling and demonstrate its utility with several examples; we will also discuss some of the software for doing spatial modeling and use one of the available packages that is particularly useful and easy to obtain and use. However, before we can proceed to show you how spatial models can be constructed and estimated, a discussion of the role of space in the causal modeling of social and behavioral processes is necessary to gain a full understanding of why spatial modeling is useful.

Next post:

Previous post: