Information Technology Reference
In-Depth Information
Table 10.12 Comparison of
rules accuracy and coverage
rate for CSLogs data using
the XRules and FullTree final
rule set
Support
1% 5% 10%
AR% CR% AR% CR% AR% CR%
XRules
72.72
66.04
61.74
40.7
56.9
23.21
(298)
(298)
(20)
(20)
(3)
(3)
FullTree
78.53
48.77
78.73
20.35
76.9
20.3
(61)
(61)
(4)
(4)
(2)
(2)
Table 10.13 Rule sets at
support 10%
#
XRules
#
FullTree
1
1
Class(0)
1
X1(1)
Class(0)
2
12811
Class(1)
2
X1(12811)
Class(1)
3
6
Class(0)
higher. On this note, the rule sets of the XRules approach will typically have higher
coverage rate, especially in the CSLOGS dataset, where subtrees do in fact occur at
many different positions due to variations in website navigation. However, one can
see that this is at times at a cost of reduction in AR, and constraining the subtrees
by position could be seen as more precise, but naturally would cover less cases. To
give a simple example, please refer to Table 10.13 where we show rule sets for the
support value of 10%. One can observe that the FullTree rule set does not contain
a rule that corresponds to rule number 3 in XRules even though it was considered
frequent by XRules. The reason for this is that the particular node with label “6”
with “Class(0)”, where “6” occurs at the same node/position in DSM did not occur
in 10% of the instances to be considered frequent and part of the FullTree rule set.
The two matching rules correspond to the first page accessed during the site naviga-
tion session, as it is labelled with pre-order position 1, namely X1 in our approach
(note that X0 is a virtual node in the CSLOGS dataset always labelled with 0 and is
removed in both approaches). For support threshold of 20 and 30% no rules were
extracted in our approach, while XRules only had the single default rule for majority
class.
10.5.3 Experiment Set 3—Academic Institution Web Log Data
Academic Institution WebLogs data is an apache2 (v2.2.3) web server logs files.
The WebLogs data was initially used in [ 16 ] in utilizing the DSM application. For
the purpose of the work in this research, the similar setting of the WebLogs data as
described in [ 16 ] has been utilized. The data was collected for a four-month period
in its native (default) format. During this period, all access to the website was stored
in logs files, while messages stored in the normal error message logs were excluded.
The access to the website was then classified as “internal” (within the university)
and “external” (outside the university). The grouped user sessions were converted
 
 
Search WWH ::




Custom Search