Graphics Programs Reference
In-Depth Information
If you compare the snippet you want to the current snippet, you should
notice that the values in the second snippet match the values for Aruba.
So there's a row for every birth rate value that is accompanied by the year.
This results in 9,870 rows of data, plus a header.
How do you get the data into the format you want? Remember what you
did with Python in Chapter 2, “Handling Data”? You loaded the CSV file in
Python, and then iterated over each row, printing the values into the for-
mat that you wanted. You can do the same thing here. Start a new file in
your favorite text editor called
transform-birth-rate.py
. Make sure it's in
the same directory as
birth-rate.csv
. Then enter the following script.
import csv
reader = csv.reader(open('birth-rate.csv', 'r'), delimiter=”,”)
rows_so_far = 0
print 'year,rate'
for row in reader:
if rows_so_far == 0:
header = row
rows_so_far += 1
else:
for i in range(len(row)):
if i > 0 and row[i]:
print header[i] + ',' + row[i]
rows_so_far += 1
If you want to
keep all your
coding in R, you
can try using
Hadley Wickham's
reshape package.
It helps you shift
data frames into
the format you
want.
This should look familiar, but now break it down. You import the
csv
package and then load
birth-rate.csv
. Then print the header, and iterate
through each row and column so that the script outputs the data in the for-
mat you want. Run the script in your console and save the output in a new
CSV file named
birth-rate-yearly.csv
.
python transform-birth-rate.py > birth-rate-yearly.csv
Great. Now use
histogram()
for that matrix; go back to R and load the new
data file with
read.csv()
. In case you skipped all the data formatting stuff (to
save for later), the new data file is online so that you can load it from a URL.
birth_yearly <-
read.csv(“http://datasets.flowingdata.com/birth-rate-yearly.csv”)
Search WWH ::
Custom Search