Advanced Topics in Initial Exploration and Dataset Preparation Using VisMiner - Visual Data Mining: The VisMiner Approach

Databases Reference

In-Depth Information

After subtracting (or removing) the outliers, the resulting dataset contains

just valid observations, which was the objective. The dataset name assigned

by the Difference operation is a combination of both involved datasets.

In this case it is “Table6.csv-Table6Outlier”. A better shorter name might

be preferred.

Right-click on Table6.csv-Table6Outlier; select “View/Edit name and

notes”.

Change the name to “Table6Valid”.

Click “Save”.

A pattern check of experimental data

The dataset ResponseTime.csv contains the results of benchmark tests com-

paring to widely used web servers, identified in the dataset as “Platform A” and

“Platform B”. The data was collected by using a simulator to repeatedly make

requests of web pages from the servers from hundreds of different locations then

measuring the response times.

Open ResponseTime.csv.

View the Summary Statistics.

The AvgPgRsp is the average time in milliseconds that it took the server to

respond, given the requested page size (FileSize in kilobytes) and the number of

requests per second (TPS) hitting the server. The average was based on

thousands of requests to the server from hundreds of clients requesting files

of the specified size and the given traffic level. As you see from the summary

statistics, file size requests ranged from 5 KB up to 50 KB and traffic levels

ranged from 100 to 550 requests per second.

View ResponseTime.csv in a scatter plot.

Select TPS on the X axis and FileSize on the Y axis.

In this view, we clearly see the benchmark design. This is not randomly

sampled data, but a carefully crafted experiment. The apparently missing

observations in the upper right corner of the grid (A in Figure 3.7) represent

trials that overloaded the server so much that it failed to respond to the page

requests. The missing observations in the column at 225 TPS (B in Figure 3.7)

were unintentional omissions made by the lab technician running the

Search WWH ::

Custom Search

Home