Database Reference
In-Depth Information
How to do it…
In this recipe, we'll use the Virginia census family data to examine the relationship between
the number of families and the number of housing units. Does having more families imply
more housing units? We probably expect these two variables to have a fairly tight linear
relationship, so this should be a clear test.
1. First, let's load the data and pull out the two ields we're interested in:
(def family-data
(incanter.io/read-dataset "data/all_160_in_51.P35.csv"
:header true))
(def housing (i/sel family-data :cols ::HU100))
(def families (i/sel family-data :cols ::P035001))
2.
Computing the linear regression takes just one line:
(def families-lm
(s/linear-model housing families :intercept false))
3.
The output of s/linear-model is a mapping that contains a lot of useful
information, including regression coeficients and other things. We can get the
r-square value (roughly, how well the model explains the variance in the data) and
the F value (how signiicant the relationship is). High F values are associated with
lower p -values, which is to say that high F values imply a lower probability that the
relationship is the result of chance:
user=> (:r-square families-lm)
0.959498864188327
user=> (:f-prob families-lm)
1.1102230246251565E-16
4.
The F test looks good, as does the r-square value. Our hypothesis looks like it
probably holds. Let's look at a graph of the data, too, though:
(def housing-chart
(doto
(c/scatter-plot families housing
:title
"Relationship of Housing to Families"
:x-label "Families"
:y-label "Housing"
:legend true)
 
Search WWH ::




Custom Search