Graphics Programs Reference
In-Depth Information
If you compare the snippet you want to the current snippet, you should
notice that the values in the second snippet match the values for Aruba.
So there's a row for every birth rate value that is accompanied by the year.
This results in 9,870 rows of data, plus a header.
How do you get the data into the format you want? Remember what you
did with Python in Chapter 2, “Handling Data”? You loaded the CSV file in
Python, and then iterated over each row, printing the values into the for-
mat that you wanted. You can do the same thing here. Start a new file in
your favorite text editor called transform-birth-rate.py . Make sure it's in
the same directory as birth-rate.csv . Then enter the following script.
import csv
reader = csv.reader(open('birth-rate.csv', 'r'), delimiter=”,”)
rows_so_far = 0
print 'year,rate'
for row in reader:
if rows_so_far == 0:
header = row
rows_so_far += 1
else:
for i in range(len(row)):
if i > 0 and row[i]:
print header[i] + ',' + row[i]
rows_so_far += 1
If you want to
keep all your
coding in R, you
can try using
Hadley Wickham's
reshape package.
It helps you shift
data frames into
the format you
want.
This should look familiar, but now break it down. You import the csv
package and then load birth-rate.csv . Then print the header, and iterate
through each row and column so that the script outputs the data in the for-
mat you want. Run the script in your console and save the output in a new
CSV file named birth-rate-yearly.csv .
python transform-birth-rate.py > birth-rate-yearly.csv
Great. Now use histogram() for that matrix; go back to R and load the new
data file with read.csv() . In case you skipped all the data formatting stuff (to
save for later), the new data file is online so that you can load it from a URL.
birth_yearly <-
read.csv(“http://datasets.flowingdata.com/birth-rate-yearly.csv”)
Search WWH ::




Custom Search