Case Study: City of Palo Alto Open Data - Enterprise Data Workflows with Cascading

Databases Reference

In-Depth Information

Calibrating Metrics for the Recommender

A good next step is to use an analytics tool such as R to analyze and visualize the data

about trees and roads. We do that step to perform calibration and testing of the data

products so far. Take a look at the src/scripts/copa.R file, which is an R script to analyze

tree and road data.

For example, Figure 8-4 shows a chart for the distribution of tree species in Palo Alto.

American sweetgum ( Liquidambar styraciflua ) is the most common tree.

Figure 8-4. Summary analysis for tree data

Also, there's a density plot/bar chart of estimated tree heights, most of which are in the

10- to 30-meter range. Palo Alto is known for many tall eucalyptus and sequoia trees

(the city name translates to “Tall Stick”), and these show up on the right side of the

density plot—great for lots of shade. Overall, the distribution of trees shows a wide range

of estimated heights, which helps confirm that our approximation is reasonable to use.

library ( ggplot2 )

dat_folder <- "~/src/concur/CoPA/out"

d <- read.table ( file = paste ( dat_folder , "tree/part-00000" , sep = "/" ),

sep = "\t" , quote = "" , na.strings = "NULL" , header = FALSE ,

encoding = "UTF8" )

Search WWH ::

Custom Search

Home