Database Reference
In-Depth Information
Correlation
Can we find any associations between the attributes of each tract and its foreclosures? Correl-
ation is a basic statistic used frequently by researchers and statisticians. In R, we can create
multidimensional correlation graphs using the pairs() scatterplot matrix package. The fol-
lowing code will enhance pairs() by allowing the upper triangle to show the actual correla-
tion numbers, while the lower triangle will show correlation plots:
#make a subset of the table that only includes covariates and foreclosures
> corTable1<-ct[,c(18,20,21,22,23,24,25,27,28,31)]
#first create a function (panel.cor) for the upper triangle
> panel.cor <- function(x, y, digits=3, prefix="", cex.cor){
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r = (cor(x, y, method="pearson", use="complete"))
txt <- format(c(r, 0.123456789), digits=digits)[1]
txt <- paste(prefix, txt, sep="")
if(missing(cex.cor)) cex <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex * abs(r))
}
#now plot using pairs()
> pairs(corTable1, lower.panel=panel.smooth,
upper.panel=panel.cor, main="2009 Housing Census Data",
labels=c("TotPop","Families","NonFamilies","TravelTime","TT+90",
"Disabled","Income","Poverty","Occupied","FCS")
, font.labels=2)
The resulting graph is shown in Figure 2-4 .
In this plot, we observe that total population, total families households, and total housing units
are all highly correlated (as would be expected). We also observe that median household in-
come, total travel time, number of occupants below poverty level, and number of non-family
households are not correlated with each other or any of the other variables. One interesting
observation is the correlation between the total population and the median household income.
There are one or two less populated tracts whose household income is also quite high, while
the remainder follow a linear trend. The median household income appears constant across all
tracts regardless of the total population in each tract. For other variables, the tracts with a high
number of individuals below the poverty level tend to have median household incomes at the
lower end of the range. Such a clumping trend is also observed between those with low in-
come. There is a non-linear relationship between foreclosure rates and median income, where
tracts above a certain income threshold experience almost no foreclosure events.
Search WWH ::




Custom Search