Databases Reference
In-Depth Information
After subtracting (or removing) the outliers, the resulting dataset contains
just valid observations, which was the objective. The dataset name assigned
by the Difference operation is a combination of both involved datasets.
In this case it is “Table6.csv-Table6Outlier”. A better shorter name might
be preferred.
Right-click on Table6.csv-Table6Outlier; select “View/Edit name and
notes”.
Change the name to “Table6Valid”.
Click “Save”.
A pattern check of experimental data
The dataset ResponseTime.csv contains the results of benchmark tests com-
paring to widely used web servers, identified in the dataset as “Platform A” and
“Platform B”. The data was collected by using a simulator to repeatedly make
requests of web pages from the servers from hundreds of different locations then
measuring the response times.
Open ResponseTime.csv.
View the Summary Statistics.
The AvgPgRsp is the average time in milliseconds that it took the server to
respond, given the requested page size (FileSize in kilobytes) and the number of
requests per second (TPS) hitting the server. The average was based on
thousands of requests to the server from hundreds of clients requesting files
of the specified size and the given traffic level. As you see from the summary
statistics, file size requests ranged from 5 KB up to 50 KB and traffic levels
ranged from 100 to 550 requests per second.
View ResponseTime.csv in a scatter plot.
Select TPS on the X axis and FileSize on the Y axis.
In this view, we clearly see the benchmark design. This is not randomly
sampled data, but a carefully crafted experiment. The apparently missing
observations in the upper right corner of the grid (A in Figure 3.7) represent
trials that overloaded the server so much that it failed to respond to the page
requests. The missing observations in the column at 225 TPS (B in Figure 3.7)
were unintentional omissions made by the lab technician running the
 
Search WWH ::




Custom Search