Databases Reference
In-Depth Information
The formula may be entered directly in the text box, or the list boxes below
the formula may be used to interactively build the formula. It is recommended
that the list boxes be used, at least to enter column names in order to avoid
spelling errors.
Display the new dataset in the location plot viewer.
Slide the left end of the pricePerSqFt range filter up to about 200 in order to
remove the typically priced homes.
Where are the remaining homes located? Does there appear to be a relation-
ship between pricePerSqFt and location?
In the Category drop-down, select “propertyType”.
Does property type help to explain why sellers of some of these homes would
expect a premium price for their home?
Exercise 3.2
View the newly created HomeCosts dataset in the parallel plot. Hide those
columns such as cul-de-sac, elementary, jrHigh, mls, schoolDistrict, street and
zip, that are for the most part adding clutter to the visualization. Create two
filters - one containing homes with a pricePerSqFt under $200 and the other
over $200.
a. Compare the mean values of the two filter sets. What attributes of the data
differentiate those homes offered at a high price per square foot?
b. View the dataset in a correlation matrix and a scatter plot. Are there any
relationships found using these two viewers that would help to explain the
price premium?
c. The correlation between sqFeet and pricePerSqFt slightly positive. In
your words, explain why this might be valid, given that in most pricing
schemes, unit prices (pricePerSqFt) decrease as the quantity (sqFeet)
increases.
Aggregating data for observation reduction
The dataset ZapataShoes.csv contains records of shoe sales by Zapata Enter-
prises. Zapata sells three lines of shoes (Fazenda, Montanha, and Praia) directly
to consumers in western USA. Each row in the dataset represents a single sale of
 
Search WWH ::




Custom Search