Databases Reference
In-Depth Information
b. View each of the datasets in the boundary plot. In what geographic areas of
the Western States' Zapata market are these shoe lines selling best?
Combining datasets
When viewing the PraiaSales dataset in the boundary plot, we see the highest
sales volume in the Los Angeles, California (900) three-digit zip. One may ask
the question, “Is it highest simply because it has the greatest population?” and
“Are there other less-populated regions where the sales rate per capita is
higher?”
To answer these questions, data is needed that is not available in the dataset -
populations for each of the three-digit zip areas. Population data is readily
availab le from the US Census Bure au and can be download ed as needed ( ww w.
census.gov ) . The dataset Zip3Pop.c sv was previousl y downloa ded. It contai ns
population counts for all three-digit areas in the USA.
Open the file Zip3Pop.csv.
To be useful, the population data in Zip3Pop.csv needs to be combined (or
joined in database terminology) with the sales data in PraiaSales. The VisMiner
Control Center has a feature that implements dataset joins. The join feature
requires that each dataset contains one or more key columns that match a row or
rows in the other set. In the case of the PraiaSales and Zip3Pop.csv datasets,
each has the common Zip3.
Drag the PraiaSales dataset over the Zip3Pop.csv and release.
Select “Join datasets”.
A dialog opens requesting specification of the column names in each
dataset to be used to match rows in the other dataset. Check the Zip3
column listed under the Zip3Pop.csv header on the left.
Check the Zip3 column listed under the PraiaSales header on the right.
The Zip3 column in Zip3Pop.csv is to be matched with the Zip3 column in
PraiaSales. In this case, the two columns have the same name. This is not a
requirement. They just need to originate from the same source domain. The join
may also require multiple key columns to perform the join. For example, if
instead of three-digit zips, county populations were to be combined with
interesting county data collected in a different dataset. Since county names
in the USA are not unique, one would need both a county name column and a
state name column in each dataset in order to successfully match rows.
Click “OK” to join the two datasets.
 
Search WWH ::




Custom Search