Statistics of Foreclosure - Data Mashups in R

Database Reference

In-Depth Information

[14] "Households..Family.households..Householder.75.to.84.years"

[15] "Households..Family.households..Householder.85.years"...

Examining our downloaded data, we see that the first line in the text file are IDs that makes

little sense, while the second line describes those IDs. The skip=1 option in read.table al-

low us to skip the first column, By skipping the first line, the headers of censusTable are

extracted from the second line. Also keep one of R's quirks in mind—it likes to replace spaces

with a period.

The columns we need are in different tables. CensusTable1 contains the tracts,

CensusTable2 has all the interesting survey variables, while FCs and polyData have fore-

closure and shape information. The str() and merge() function can be quite useful in this

case. The Geography.Identifier.1 in the censusTable2 looks familiar—it matches with

STFID from the PolyData table extracted from our shapefile:

> str(myPolyData)

Classes 'PolyData' and 'data.frame': 381 obs. of 9 variables:

$ PID : int 1 2 3 4 5 6 7 8 9 10 ...

$ ID : Factor w/ 381 levels "1","10","100",..: 1 112 223 316

327 ...

$ FIPSSTCO: Factor w/ 1 level "42101": 1 1 1 1 1 1 1 1 1 1 ...

$ TRT2000 : Factor w/ 381 levels "000100","000200",..: 1 2 3 4 5 6 7

...

$ STFID : Factor w/ 381 levels "42101000100",..: 1 2 3 4 5 6 7 8 9

10 ...

(snip)

#selecting columns with interesting data

> ct1<-censusTable2[,c(1,2,5,6,7,16,42,54,56,75,76,77,93,94,105)]

#merge function can merge two tables at a time

> ct2<-merge(x=censusTable1,y=myPolyData, by.x='GEO_ID2',

by.y='STFID')

> ct3<-merge(x=ct2, y=ct1, by.x='GEO_ID2',

by.y='Geography.Identifier.1')

Now we have a connection between the tracts and our census data. We also need to include

the foreclosure data. We have myTrtFC , but it would be easier to do another merge if it was a

data frame:

> myTrtFC<-as.data.frame(myTrtFC)

> names(myTrtFC)<-c("PID","FCs")

> ct<-merge(x=ct3,y=myTrtFC,by.x="PID",by.y="PID")

Changing the names for each column will facilitate scripting later on. Of course, it's just a

personal preference:

Search WWH ::

Custom Search

Home