Database Reference
In-Depth Information
[14] "Households..Family.households..Householder.75.to.84.years"
[15] "Households..Family.households..Householder.85.years"...
Examining our downloaded data, we see that the first line in the text file are IDs that makes
little sense, while the second line describes those IDs. The
skip=1
option in
read.table
al-
low us to skip the first column, By skipping the first line, the headers of
censusTable
are
extracted from the second line. Also keep one of R's quirks in mind—it likes to replace spaces
with a period.
The columns we need are in different tables.
CensusTable1
contains the tracts,
CensusTable2
has all the interesting survey variables, while
FCs
and
polyData
have fore-
closure and shape information. The
str()
and
merge()
function can be quite useful in this
case. The
Geography.Identifier.1
in the
censusTable2
looks familiar—it matches with
STFID
from the
PolyData
table extracted from our shapefile:
> str(myPolyData)
Classes 'PolyData' and 'data.frame': 381 obs. of 9 variables:
$ PID : int 1 2 3 4 5 6 7 8 9 10 ...
$ ID : Factor w/ 381 levels "1","10","100",..: 1 112 223 316
327 ...
$ FIPSSTCO: Factor w/ 1 level "42101": 1 1 1 1 1 1 1 1 1 1 ...
$ TRT2000 : Factor w/ 381 levels "000100","000200",..: 1 2 3 4 5 6 7
...
$ STFID : Factor w/ 381 levels "42101000100",..: 1 2 3 4 5 6 7 8 9
10 ...
(snip)
#selecting columns with interesting data
> ct1<-censusTable2[,c(1,2,5,6,7,16,42,54,56,75,76,77,93,94,105)]
#merge function can merge two tables at a time
> ct2<-merge(x=censusTable1,y=myPolyData, by.x='GEO_ID2',
by.y='STFID')
> ct3<-merge(x=ct2, y=ct1, by.x='GEO_ID2',
by.y='Geography.Identifier.1')
Now we have a connection between the tracts and our census data. We also need to include
the foreclosure data. We have
myTrtFC
, but it would be easier to do another merge if it was a
data frame:
> myTrtFC<-as.data.frame(myTrtFC)
> names(myTrtFC)<-c("PID","FCs")
> ct<-merge(x=ct3,y=myTrtFC,by.x="PID",by.y="PID")
Changing the names for each column will facilitate scripting later on. Of course, it's just a
personal preference: