Information Technology Reference

In-Depth Information

4.

Hashing and Bit Vectors

4.1.

Algorithm

In this section, we adopt the BV algorithm to improve the search performance of
Pruned

Tuple Space Search
. Since the filters in the same tuple can be accessed with one hash probe,

tuple space search is suitable for a large amount of filters with orderly length combinations,

which results in fewer tuples. In contrast, the
BV
algorithm could handle the filters with

disordered length combinations. However, the
BV
algorithm is limited in the number of

filters. Therefore, we divide the filters into two categories, where one category is searched

by
BV
and the other is searched by tuple space search. We further propose an algorithm to

minimize the number of tuples for a predetermined memory space of bit vectors.

We use the filters in Table 1 as an example to further illustrate our idea. Since there

are 15(=8+7) bit vectors and each bit vector has 11 bits, the total storage requirement of bit

vectors is 165 bits. The 11 filters can be distinguished into six prefix-length combinations

and each combination corresponds to one hash table. Assume that we select the filters of

length combination
h3, 2i
,
R
6
∼ R
9
, for tuple space search. The remaining seven filters

have 11 distinct prefix specifications; hence, the required bit vector storage is reduced to

77 bits. If we choose further the filters of another length combination
h4, 0i
,
R
10
and
R
11
,

then the bit vector storage would be reduced to
40
bits. While there are only two tuples in

the resulted tuple space, the bit vector storage is reduced to less than a quarter. Since the

search cost of two hash tables is low, the search and storage performance could be improved

simultaneously.

We describe the filter categorization procedure of our scheme in the following. First,

we define a threshold value for the bit vector storage,
S
threshold
. The threshold value can

be set to the size of on-chip static random access memory (SRAM) to improve the access

performance. Next, we derive the original storage requirement of bit vectors by multiplying

the numbers of filters by the number of distinct prefix specifications in the filter database.

We denote the number of filters as
N
and the number of distinct prefixes as
P
. Then, the

storage requirement of bit vectors,
S
, can be expressed as
S = N × P
. In the third step,

we construct a tuple space according to the length combinations of filters. Assume that

the number of the filters in tuple
T
i
is denoted as
N
T
i
and the number of the distinct field

specifications is denoted as
P
T
i
. The storage requirement for the filters in
T
i
is denoted

as
S
T
i
. Then, the maximum value of
S
T
i
is equal to
N
T
i
× P
when the
P
T
i
prefixes are

also specified by the filters of other tuples. Also, the maximum value of
S
T
i
is increased to

S
T
i
= N ×P − (N −N
T
i
)× (P −P
T
i
)
when none of the prefixes in
T
i
is specified by the

filters of other tuples. Therefore, we calculate the numbers of the filters and unique prefix

specifications for each tuple. The number of unique prefix specifications,
P
T
i
, indicates

those which are
only
specified in
T
i
. The value of
S
T
i
is thus expressed as
S
T
i
= N × P −

(N − N
T
i
) × (P − P
T
i
)
. In the fourth step, we extract the filters of the tuple
T
i
with the

largest
S
T
i
value for tuple space search. Such selection can maximize storage reduction for

the bit vectors while only one extra hash table is generated for tuple space search. After

deducting the required bit vector storage of the selected tuple, we check whether the new

storage requirement,
(N − N
T
i
) × (P − P
T
i
)
, is smaller than
S
threshold
. If the answer is

yes
, the procedure of filter categorization is accomplished. Otherwise, the values of
N
and