Information Technology Reference
In-Depth Information
H , in prefetching, however, the problem focuses
on the maximum support among the sequences.
Such differences make the prefetching algorithm
much more efficient. Furthermore, for practical
usage, we design some additional constraints for
mining prefetching pages:
O ( M log 2 m ) in the worst case, and the extra space
cost from each sorted index is O ( m ) in the worst
case.
The correctness of PSP is discussed as follows.
Lemma . The support of a sequence α + β is always
not greater than the support of α .
Proof . Assuming that { S i } is the set of sequences
whereα + ⊆ S i , it is obvious that for each
S i , we have α ⊆ S i , which indicates the
support of α + β is always not greater than α .
( ) ≤ +1, where S is the prefetch-
ing list and k is the maximum prefetching
buffer size of the user node. Supposing we
push more than k pages to the user node
each time, some of the pushed pages may
be discarded due to the limit of the prefetch-
ing buffer size.
length S
k
From the lemma, we have the corollary as
follows:
We can add a constraint in the definition of
α
⊆ that
i
+ − ≤
i
d
, where
x
1
x
1 1 and d is a given integer
threshold. Because the prefetched pages
should not be far away from the accessed
pages in a sequence of the trace library,
otherwise its may be discarded before
accessed.
x
n
[ ,
]
Corollary . If the support of a sequence α is not
the maximum, the support of a sequence α + β
cannot be the maximum either.
Based on the lemma and the corollary, the
following theorem proves the correctness of
algorithm PrefixSpan-Prefetching:
Our prefetching algorithm named PrefixSpan-
Prefetching (PSP) can be described as Algorithm 1.
The algorithm PSP is efficient since it has no
recursion. The time cost of PSP primarily comes
from the iteration from step 4 to step 26, the
condition to stop this iteration is that length ( S )> k +1
or it cannot find possible item P when spanning
S in step 16. In other words, in the worst case this
iteration should run k times for a prefetching list
of length k . In each iteration, steps 5-17 need to
scan the trace library, whose maximum size is M .
Suppose that the maximum length of sequences
in the trace library is m , in order to perform a
binary search in each sequence, we build a sorted
index in advance for the first occurrence of each
item in the sequence. Therefore, for each item in
the prefetching list, the time cost of PSP is
Theorem . After algorithm PrefixSpan-Prefetch-
ing, the support of sequence S is the maxi-
mum for all possible S , where
2
( ) .
Proof . Step 14 in PSP is only executed in the first
iteration and it can get a sequence S =< o c , P >,
which has the maximum support; step 16 is
executed after the first iteration, it only ac-
cepts item P if the support of S +< P > is equal
to that of S . Before step 19, if the support of
any sequence T is less than that of S , then
there is no sequence that has the maximum
support, with the prefix of T (due to the
corollary). Thus, T should not be spanned
in step 19. Hence, the support of sequence S
is always the maximum in each iteration. ■
length S
≤ +
k
1
Search WWH ::




Custom Search