Measuring the Impact of Space and Spatial Relationships

The basic statistic used to measure relationships among variables in space is the same as that used in general to measure relationships—the familiar correlation coefficient, or Pearson’s r—a statistic that varies between +1.0 and -1.0, which measures the degree of linear association between the distribution of cases on one variable and the distribution of another variable on the same set of observations. This statistic is the basis for all the models that are used in social and behavioral research—regression, structural equations, hierarchical models; the models that deal with nonlinear variable and alternative approaches like cox regression, logit models, event history or survival models, log-linear analysis are all alternative ways of measuring association between variables when the correlation coefficient cannot be calculated because the data are not continuous or take on only a few categorical values.

We can use the same basic approaches to spatial data that we do to regular data, but the problem in doing so is that the nature of spatial data creates conditions among the observations that violate fundamental assumptions required to insure the outcomes of these analyses are accurate and appropriate. For example, Pearson’s r, regression, and most other similar models work properly if the observations are drawn from independent samples of the population you are studying. This would mean that the cases of youth violence we have been examining in this topic would be taken from randomly distributed locations across the space we have been mapping—but we already know that this is not the case. We saw in Section 2 that these incidents cluster and are very dense in some parts of the map and quite sparse in other parts of the map. How do we measure how space is related to the variables we want to map and analyze?


In an earlier example we discussed the statistic called Anselin’s Local Moran I, which was used as a measure to determine how similar the clustering of data was in a geographic unit and its neighboring units. High positive values of the Local Moran I suggested that the clustering was similar and that these units might be collapsed into a larger homogeneous unit referred to as a tessellation. Did you wonder at the time that if there was a Local Moran I, was there also a Global Moran I? Indeed there is, and Moran’s I is the most prominent measure of spatial relationship that is used in the GIS literature (Moran, 1950). Moran’s I is a correlation coefficient that measures the degree of association across space in a variable’s distribution—that is how spatially clustered the observations are in this variable. The formula for the coefficient is, in a very simplified form:

tmp15228_thumb

which means the product of a variable’s observation in one spatial unit multiplied times the same variable’s observation in another unit, indicated by the subcripts "i" and "j," summed over all the "i" and "j" pairs of units, divided by the summation across all the units of the observation squared. This formula takes the form of a product moment correlation like Pearson’s r, and varies between -1.0 and +1.0, in that it is basically the covariance of the observations divided by the variance of the observations. Usually the variables have been mean standardized and this is the unweighted "raw" version of the coefficient.

If Moran’s I is positive and closer to 1.0, it means that the data in the units are similar across space, and that there is a great deal of spatial clustering. If the value of Moran’s I is close to zero, it means the data are randomly distributed across space, and if the value of Moran’s I is negative and closer to -1.0, this means that the data are dissimilar from one spatial unit compared to another, and therefore there is little clustering of similar observations on this variable.

The purpose of spatial modeling is to explain the spatial pattern or relationships among the data, but as we saw earlier there are two ways in which the spatial nature of the data can impact these relationships. If the variable to be explained or predicted shows evidence of spatial patterning, or autocorrelation as this effect is called, it may be due to the effects of the observations on this variable in other spatial units, it may be due to the effects of the variables you think are related to the dependent variable, or it may be due to the errors introduced into the model because of the spatial nature of the data.

Let’s consider the second scenario first. If the observed raw Moran’s I measure of spatial autocorrelation is caused by the impact of one or more of the independent variables you include in your model based on theory and/or some applied knowledge of the behaviors and locations being studied, this is perhaps the ideal situation. If you can model directly the spatial relationships among the variables in the model and the non-spatial relations, you will be able to develop a very thorough understanding of the data and their meaning. This will be the case as long as you know all the correct variables to measure and you can include them on your map and in your model. What happens if you do not know or cannot measure all of the correct variables?

To ascertain if this is true or not, you can run a spatial model with all the variables you think are important to include, and calculate a new Moran’s I statistic based on the residuals of the model instead of on the observations substituting the vector of the regression residuals for Y, as given in the equation above. However, you would also want to add to the equation another feature of the space you are analyzing that helps the model to understand its structure; this is called the connection matrix, and we will discuss its nature and importance below. For the sake of argument, however, suppose that your new Moran’s I still revealed spatial autocorrelation in the residuals; this would indicate one of two possibilities. The first is that the spatial relationship within the dependent variable is very strong and remains significant even after the influence of the other variables in the model. The second possibility is that you have inadvertently left out one or more variables from the model that would in fact account for the spatial pattern in the dependent variable that remains important after the influence of other variables is accounted for. Either way, these latter two scenarios would indicate that the standard regression model is not appropriate for the spatially influenced data you are trying to analyze.

Next post:

Previous post: