Database Reference
In-Depth Information
Using PCA to graph multi-dimensional data
So far, we've been limiting ourselves to two-dimensional data. After all, the human mind
has a lot of trouble dealing with more than three dimensions, and even two-dimensional
visualizations of three-dimensional space can be dificult to comprehend.
However, we can use PCA to help. It projects higher-dimensional data down to lower
dimensions, but it does this in a way that preserves the most signiicant relationships in the
data. It re-projects the data on a lower dimension in a way that captures the maximum amount
of variance in the data. This makes the data easier to visualize in three- or two-dimensional
space, and it also provides a way to select the most relevant features in a dataset.
In this recipe, we'll take the data from the US census by race that we've worked with in
previous chapters, and create a two-dimensional scatter plot of it.
Getting ready
We'll use the same dependencies in our project.clj ile as we did in Creating Scatter Plots
with Incanter , and this set of imports in our script or REPL:
(require '[incanter.core :as i]
'[incanter.charts :as c]
'[incanter.io :as iio]
'[incanter.stats :as s])
We'll use the aggregated census race data for all states. You can download this from
http://www.ericrochester.com/clj-data-analysis/data/all_160.P3.csv .
We'll assign it to the race-data variable:
(def race-data (iio/read-dataset "data/all_160.P3.csv"
:header true))
How to do it...
We'll irst summarize the data to make it more manageable and easier to visualize. Then we'll
use PCA to project it on a two-dimensional space. We'll graph this view of the data:
1.
First, we need to summarize the columns that we're interested in, getting the total
population of each racial group by state:
(def fields [:P003002 :P003003 :P003004 :P003005
:P003006 :P003007 :P003008])
 
Search WWH ::




Custom Search