Databases Reference
In-Depth Information
a pair of shoes. There are almost 500,000 sales recorded in the dataset - too
many for the visualization tools. Columns in the dataset include: customer
number, customer name, street, city, state, zip, and shoe (line).
Open the dataset ZapataShoes.csv. (Owing to its size, you may need to wait
a little longer than normal.)
Review the summary statistics for the dataset.
Zapata would like to explore and compare sales in the geographic areas they
serve. In reviewing the available data, there are four columns containing
customer location information: street, city, state, and zip. As a starting
point, sales could be analyzed by zip, starting with a breakdown by three-
digit zip. Since there is no three-digit zip column in the dataset, one needs to
be created.
In the Control Center, right-click on the dataset; select “Create derived
dataset”.
Enter a name of “ShoesZip3”.
Click “Select All” to include all existing columns in the new dataset.
In the “Computed Columns” box, click “New”.
In the name box of the “New Column Definition” form, enter “Zip3”.
In the “Operators” list box, click “trunc” - short for truncate.
Check to ensure that the cursor is between the open and close parentheses.
In the “Available Columns” list box, click Zip.
In the “Operators” list box, click “/” for division.
Type in “100”. The formula should now read: trunc(Zip/100). It takes the
five-digit zip, divides by 100, then truncates the digits following the
decimal.
Click “Create”, to add the newly defined column to the dataset.
Click “Create” again, to build the newly derived dataset.
View the summary statistics for your new dataset. There is a new column
named Zip3. It has a minimum value of 800, a maximum value of 994, and
a cardinality of 154.
There are still almost 500,000 rows in the new dataset - one for each shoe
sale. A dataset containing shoe sales totaled by Zip3 is needed.
Search WWH ::




Custom Search