Advanced Topics in Initial Exploration and Dataset Preparation Using VisMiner - Visual Data Mining: The VisMiner Approach

Databases Reference

In-Depth Information

a pair of shoes. There are almost 500,000 sales recorded in the dataset - too

many for the visualization tools. Columns in the dataset include: customer

number, customer name, street, city, state, zip, and shoe (line).

Open the dataset ZapataShoes.csv. (Owing to its size, you may need to wait

a little longer than normal.)

Review the summary statistics for the dataset.

Zapata would like to explore and compare sales in the geographic areas they

serve. In reviewing the available data, there are four columns containing

customer location information: street, city, state, and zip. As a starting

point, sales could be analyzed by zip, starting with a breakdown by three-

digit zip. Since there is no three-digit zip column in the dataset, one needs to

be created.

In the Control Center, right-click on the dataset; select “Create derived

dataset”.

Enter a name of “ShoesZip3”.

Click “Select All” to include all existing columns in the new dataset.

In the “Computed Columns” box, click “New”.

In the name box of the “New Column Definition” form, enter “Zip3”.

In the “Operators” list box, click “trunc” - short for truncate.

Check to ensure that the cursor is between the open and close parentheses.

In the “Available Columns” list box, click Zip.

In the “Operators” list box, click “/” for division.

Type in “100”. The formula should now read: trunc(Zip/100). It takes the

five-digit zip, divides by 100, then truncates the digits following the

decimal.

Click “Create”, to add the newly defined column to the dataset.

Click “Create” again, to build the newly derived dataset.

View the summary statistics for your new dataset. There is a new column

named Zip3. It has a minimum value of 800, a maximum value of 994, and

a cardinality of 154.

There are still almost 500,000 rows in the new dataset - one for each shoe

sale. A dataset containing shoe sales totaled by Zip3 is needed.

Search WWH ::

Custom Search

Home