Databases Reference
In-Depth Information
TABLE 2.4: Capturing Patterns with Repeated Values
#
Events
Attributes
Actual
Values
RI
RA
1
hasNext
2
next
return.availSeats
4
0
0
3
getSeatsAvailable
return
4
1
1
4
add
flight.availSeats
4
1
1
5
hasNext
6
next
return.availSeats
7
0
0
7
getSeatsAvailable
return
7
1
1
8
add
flight.availSeats
7
1
1
9
hasNext
10
next
return.availSeats
4
2
4
11
getSeatsAvailable
return
4
2
1
12
add
flight.availSeats
4
2
1
13
hasNext
indicates the number of events observed from its last occurrence. Column RA
in Table 2.4 shows the values produced by the relative to access rewriting
strategy. We can observe that these values capture well the patterns occurring
in these traces.
2.5.1.5
Choosing a Rewriting Strategy
The three rewriting strategies discussed above address complementary as-
pects, and it is hard to identify a priori strategy that best adapts to a data
cluster. The choice of a strategy mainly depends on the nature of the ob-
served behaviors and on the collected data. Since the amount of data can be
extremely large, testers can seldom manually inspect data clusters to choose
a proper rewriting strategy. KLFA automatically identifies the best rewriting
strategy for each data cluster, based on the observation that the effectiveness
of a rewriting strategy depends on the ability of capturing the regularity of
the data flows. We measure such regularity as the number of symbols used
by a rewriting strategy to rewrite a data cluster: The smaller the number of
symbols used to rewrite the concrete values, the better the rewriting strategy
captures the regularity of the data flow.
KLFA selects the best rewritten version of each data cluster by choosing
the one with the smallest number of distinct symbols. To reduce the noise of
spurious values that cause the generation of additional symbols, KLFA selects
the technique that rewrites 50% of the attribute values with the fewer number
of symbols. In this way, it selects the rewritten version of a data cluster that
better captures the regularity of the core behavior of a data cluster.
 
Search WWH ::




Custom Search