Information Technology Reference
In-Depth Information
assignment would be essentially arbitrary. Fragments that are too ambiguous to
evaluate in the context of this experiment would be similarly confusing to the
analyst in practice. Measuring the quality of such fragments is pointless; they
are all bad. For this reason, the metrics are not applied to ambiguous fragments.
Instead, the fragments are counted separately, and presented as an index of
ambiguity, indicating one aspect of the performance of the LCA overall.
AmbiguousFragments
AllF ragments
Ambiguity =
(4)
4.5 Trivial Fragments
By definition, fragments made up of one connection element always match one
session and have unit accuracy. Their effect is to increase the aggregate accuracy
in a meaningless way. For example, if half of all fragments are trivial, the aggre-
gate accuracy is guaranteed to be at least 0.5. This is an unnaturally inflated
score that does not represent the accuracy of non-trivial fragments. To correct
this, accuracy is not measured for trivial fragments, and aggregate results are
presented with a triviality score.
T rivialF ragments
AllF ragments
T riviality =
(5)
5R su s
5.1 Trivial and Ambiguous Fragments
Trivial fragments accounted for 5.25% to 9.33% of all fragments in Test A and
12.81% to 16.81% in Test B. The larger number of trivial fragments in Test B is
to be expected, as the naive method of Test A chains connections into fragments
much more readily than the discerning heuristic of Test B. It is important to
mention that some fragments were small because the sessions themselves were
small. Specifically, 3.48% to 7.21% of actual user sessions were trivial.
Ambiguous fragments accounted for 2.25% to 4.41% of all fragments in Test
A and 1.14% to 4.02% in Test B. There was no statistically significant difference
in ambiguity between the two methods.
5.2 Coverage
The distributions of coverage scores for Tests A and B are shown in Figure 8
and 9. The coverage of the fragments isolated by the heuristic appear to be
exponentially distributed, with about 75% of them having session coverage less
than 25%. The naively isolated fragments are distributed much differently, with
generalized peaks at coverages less than and greater than 50%.
5.3 Accuracy
The distribution of fragment accuracy for Tests A and B is shown in Figures 10
and 11. The figures show clearly that the heuristic isolates fragments that are
much more accurate than those of the naive method.
Search WWH ::




Custom Search