The Basics of Geocoding (GIS and Spatial Analysis)

How did we know where to put the pins in Figure 1.3? The process of identifying locations to be placed on a map is called geocoding, in other words, coding the location of an object, a place, an event, a building, or an address where something of interest took place. At its most basic, something can be geocoded by knowing its latitude and longitude; for example, the capital of California, Sacramento, is located at 38.581N latitude and -121.493W longitude. However, unless you really know your longitudes and latitudes, this is not very helpful if your task is to place Sacramento in the proper location. However, with ArcGIS software (and other GIS software packages) this can be useful information for geocoding the locations of the capitals. One of the useful options in ArcMap is the option to identify an object on the map. This option can be selected by clicking on the tool bar in the upper left-hand portion of the ArcMap screen as shown in Figure 1.4.

FIGURE 1.4 The tool bars and the Identify tool

The tool bars and the Identify tool

The "i" inside the dark circle is the Identify tool. Once you activate the tool you can move it around the map to various objects; clicking on the object to reveal what the program knows about that object or location.


In Figure 1.5, you can see the result of clicking the object on the pin that is likely to be the capital of Utah, Salt Lake City.

FIGURE 1.5 Identifying Salt Lake City

Identifying Salt Lake City

Among the information displayed in the drop down box labeled Identify Results is the longitude and latitude of Salt Lake City. Notice also at the bottom right of the screen another latitude and longitude is displayed; if you move the cursor around on a map, this line will display for you the longitude and latitude of wherever the cursor is pointing.

Most times, however, you will not have longitude and latitude as part of a database you want to link to a map. Another very common type of information you may have access to is a street address or an intersection of two streets. This kind of information can also be used to link objects, events, and structures to a map through the process of geocoding.

Before we get into the details of how to geocode addresses, why do we want to know the address of an event of sociological, criminological, or other disciplinary significance? How can this information be useful to us? Sociology and criminology are disciplines that focus on the importance of context—as does public health and geography, and increasingly, public policy and other social sciences as well. But what does this really mean? In much of the history of these disciplines, the idea that context mattered for understanding social behavior, criminal behavior, health behavior, and so on was more platitude than a nexus for analytic understanding. Even when context really mattered, as in a theory such as Sutherland’s (1947) differential association, in which the context created by your friends was seen to influence whether or not you developed attitudes unfavorable towards the legal code and whether or not you learned how to commit illegal acts, the physical context was not treated as part of "context" at all (see Matsueda, 1982). For example, you might find that, following Sutherland’s approach, two individuals have the same number of and intensity of connections to delinquent peers, but in one case these peers all live within a block or two of the subject, while in the second case these peers are scattered all across the city. If the second subject showed fewer delinquent acts, this could be explained by the lack of concentrated access to these peers in the immediate environment. Being able to place the location of peer networks in physical proximity or not could make the difference in delinquency understandable; looking only at Sutherland’s ideas as stated would not allow you to fully understand the differences in outcomes. Being able to "bring the context back in" with geocoding of the location of people, events, organizations, facilities, and so on is an important tool that GIS can bring to social and behavioral research for increased power and understanding.

In the following example we will examine youth violence in the city of Riverside, California. In terms of policy, it is very much in the city’s interest and that of its police department to know where youth violence clusters. In such "hot spots" of crime (see Sherman and Berk, 1984) police can place extra patrols, and city officials can build neighborhood centers, increase after school programs, provide for public health nurse visits, and/or a host of other interventions that may reduce youth involvement in crime and violence. What GIS enables is the ability to "see" such patterns across a city’s neighborhoods and thus guide city and police policies about where to deploy scarce resources and city expenditures.

Example: The Process of Geocoding

The basic procedure for geocoding address-based data into a map first involves obtaining a street database. A street database contains information that defines the lengths, shapes, directions, and address ranges for all the streets, roads, avenues, circles, culs-de-sac, and other such units that cities and towns use to label routes that have addresses located on them. The street database will also be linked to the other types of maps that can be created for a location—city, town, county, and so on. In Figure 1.6, the city of Riverside, California, circa 2000, is displayed with the U.S. Census Bureau block group units outlined.

FIGURE 1.6 Riverside, California, circa 2000, U.S. Census block groups

Riverside, California, circa 2000, U.S. Census block groups

Block groups are units of measurement created by the Census Bureau for data collection and distribution for the decennial censuses—this map comes from the 2000 U.S. Census. Block groups in urban areas are made up of the familiar city block—about 4 to 8 blocks per block group, with a population in 2000 ranging from a few hundred people to about 2000 individuals. The block group is often the smallest unit of analysis for cities in the United States that you can be assured of getting the most data from; at the block level, the Census Bureau is concerned that someone could identify individuals by name from the Census data, so in many cases data like ethnicity, family composition and structure, income levels, and so on are missing in Census data sets released to the public and to researchers at the block level. The block group is usually large enough and has enough people in it to make it unnecessary to suppress any data of interest. Census tracts, perhaps a more familiar unit to some, are made up of block groups.

So a basic map for Riverside has been obtained from the Census Bureau (maps like this for the entire United States are included with the ArcGIS software); now what is needed is an address database. The U.S. Census Bureau once again is the data provider in this case. In order to try and count everyone where they live, the Census Bureau needs to have a pretty good idea of all the places people live in, and where such residences can be found on the map. The U.S. Constitution requires that certain information be gathered from a full count of individuals in the country, and in order to count people the Census Bureau needs to have an address where it can deliver a basic 100 percent Census form to each residence. If a Census form is mailed to a residence and the residents do not fill it out and send it back, the Census Bureau will send someone to that address to try and see who lives there and to get the basic information for the 100 percent count. However, the first step is to mail a Census form, and to do that the Bureau needs an address for every dwelling unit or residence in the country. So the Census Bureau also creates an address database every ten years for the decennial Census.

These address databases are also included with ArcGIS software and they can be obtained from the Census Bureau and other providers. Using the Streetmap database for 2000 from ArcGIS, we can add a layer to the map the represents all the known streets and addresses in Riverside, circa 2000.

To do this, we first add a new layer that contains the streets of Riverside to the map of Riverside block groups. We do this by selecting the Add layer command, the dark cross with a light background located on the tool bar third from the top, next to the map scale display (1 : 260,576).

FIGURE 1.7 Adding a new layer in ArcMap

Adding a new layer in ArcMap

This opens up a menu of the available files and layers in the default directory (from where the first map was added to this session of ArcMap); we select one from the last column on the right called rivbasemap. Notice the symbol this file is displayed with—it looks like a tiny street map. The software has already identified this database as a street database. To add this street database to the map, click the database and click on Add.

Figure 1.8 Adding a new database to a map

Adding a new database to a map

Note that the extension on this file is .shp; this is referred to as a shape file, and it is a file and accompanying data that contains information allowing these data, the streets in this case, to be displayed in a map of Riverside. We will discuss the locating and creating of shape files later in this section; having shape files makes the creation of maps much easier. Since this is a shape file, as soon as it is added to this session of ArcMap, the streets are displayed as a new layer overlaid on the block group map.

FIGURE 1.9 New street map layer displayed as an overlay on the existing map

New street map layer displayed as an overlay on the existing map

The database for the streets contains the street names and all the recognized address ranges. Left-clicking on the street map in the table of contents window to the left of the map itself brings up a menu from which you can view the database associated with the map layer you left-clicked on.

Figure 1.10 Popup menu for map layer

Popup menu for map layer

The menu in Figure 1.10 has a number of useful options, one of which is to display the attribute table. This is a spreadsheet version of the data associated with the map layer, in this case the street database. Click on Open Attribute Table to access these data.

FIGURE 1.11 Attribute table for Riverside streets

Attribute table for Riverside streets

The sixth column from the left in Figure 1.11 gives the name of the street, and the next column to the left gives the type of street—Ln for Lane, Rd for Road, Dr for Drive, St for Street, Wy for Way, and so on. These types are really part of the name of the street, so in Figure 1.11 if you want to find the street on a map you look for Priscilla Lane. The types can become important for the process of geocoding because addresses recorded by people are often not as accurate as the address databases compiled by the Census Bureau (although some errors can exist in these databases as we will illustrate later). For example, someone may record an event as happening at 19792 Smith St, when the actual name of the street is Smith Rd, or Road, as shown in line 5 of the attribute table in Figure 1.11. This can cause difficulties in the geocoding process, and it is a detail that the geocoder can fix via address editing, as we will see below.

The attribute table also shows the address ranges on the right and left sides of the street (columns labeled "R_f_add" and "L_t_add" in Figure 1.11). Traditionally, left side addresses are odd and right side addresses are even, and the address range is listed, so for example for Smith Rd, the address range on the right or even side of the street is from 19600 to 19792. On the left or odd side, the range is from 19601 to 19793, parallel to the right or even side addresses. These are addresses that were known to exist on that street when the database was assembled.

The menu in Figure 1.10 also has a selection for labeling the data displayed on the map. In the case of the street map this is a very useful function as it toggles on and off the street names, so that you can locate a street by name. At the level of magnification in Figure 1.10, such labels would be hard to read. The magnification level can be changed in or out by selecting the magnifying glass icon from the tool bar at the top left of the ArcMap screen; + (plus) for making the image more detailed (Figure 1.12), and – (minus) for decreasing the detail.

FIGURE 1.12 A detailed view of the streets of Riverside

A detailed view of the streets of Riverside

Now if we display the menu in Figure 1.10, and click on Label Features, we can activate the street name labels.

FIGURE 1.13 Streets of Riverside labeled

Streets of Riverside labeled

In Figure 1.13 you can see Cleveland Street right in the middle running at about a 45 degree angle across the map, and as you go from the lower left to the upper right, Cleveland crosses Monroe; a number of streets in this section of the city seem to be named after U.S. presidents. The streets are displayed in gray, while the black lines that sometimes overlap with the streets are the block group boundaries. In the table of contents window on the left side of the screen, you can turn off and on the display of each layer of the map by clicking on the check mark to the left of the name of the layer. We can turn off the block group map and just display the streets.

FIGURE 1.14 Streets of Riverside

Streets of Riverside

In Figure 1.14 you can see that not only are the darker boundary lines gone from the display, but the background color of the block group layer is also gone.

Next post:

Previous post: