Information Technology Reference
In-Depth Information
datasets
Two datasets are used in our experiments, the NASA dataset and the EPA dataset. The NASA dataset
was collected from July 1, 1995 through July 31, 1995 for a total of one month's requests from the
NASA server at Kennedy Space Center. The EPA dataset was collected from 23:53:25 August 29, 1995
through 23:53:07 August 30, 1995 for a total of 24 hours of requests from the EPA server at Research
Triangle Park, NC.
The NASA and EPA datasets are converted into sessions as described in the third section. Table 2
gives a summary of the datasets.
Performance Measurements
The efficiency of the MINCOST algorithm is evaluated using average cost saved and percentage of
average cost saved. The total cost of pages in the initial sessions and the total cost of pages in the final
sessions after MINCOST are calculated using the definition in Equation 1. The difference between
these two gives the total cost saved. The total cost saved is averaged over all sessions giving the aver-
age cost saved. The percentage of average cost saved is the average cost saved as a percentage of the
average cost.
Total Cost Saved = Total Cost without MINCOST - Total Cost with MINCOST
Average Cost = Total Cost without MINCOST / Total Sessions
Average Cost Saved = Total Cost Saved/Total Sessions
Average Cost Saved (%) = (Average Cost Saved / Average Cost) * 100
experimental Parameters
The average cost saved and the percentage of average cost saved are measured with respect to prob-
ability threshold, depth bound, number of shortcuts, and the order of the N-gram. In each experiment,
we vary one parameter while keeping others to their default values. The results are reported from a 10
fold cross-validation. The entire dataset is divided into ten equal portions. Each portion is used as the
Table 2. NASA and EPA datasets summary
NASA DATASET
EPA DATASET
Total Log Records
3,461,612
47,748
Total Sessions
132539
2074
Unique URLs
768
1821
Average Session Length
(Number of Pages in a Session)
3.134
4.222
http://ita.ee.lbl.gov/html/
contrib/NASA-HTTPhtml
http://ita.ee.lbl.gov/html/
contrib/EPA-HTTPhtml
URL For Download
 
Search WWH ::




Custom Search