Information Technology Reference
In-Depth Information
Caching
The original PHT is a binary tree. As shown
in the evaluation, binary trees do not scale
well in a distributed phone book scenario.
Therefore, the EPHT is an n -ary tree, and
we recommend to use n =26, i.e. the size of
the applied alphabet.
As the EPHT never shrinks, inner nodes are im-
mutable. They will never be deleted or altered.
That means that inner nodes can be cached infi-
nitely in the DHT. Whenever a peer learns about
the existence of an inner node, it can cache that
information and respond when that prefix is que-
ried the next time. Without caching, prefixes that
are accessed very frequently would cause a lot of
network traffic for the peer being responsible for
that prefix. Using caching, this network traffic can
be balanced in the DHT.
In the original PHT, if the number of entries
in a subtree falls below a certain threshold,
that subtree collapses into a single leaf
node. The EPHT can only grow, but never
shrink, which enables us to introduce ex-
tensive caching of inner nodes.
Empty nodes are not handled specially in
the PHT algorithm. In the Extended PHT,
we introduced an additional linked list
skipping the empty nodes to improve per-
formance. This is because we observed that
a significant number of prefixes do never
appear in user's names, which results in
empty leaf nodes for these prefixes.
COMPARISON WITH THE
ORIGINAL PREFIX HASH TREES
The Extended Prefix Hash Tree algorithm pre-
sented here derives from the Prefix Hash Tree
(PHT) algorithm proposed in (Rambhadran et al,
2004). However, the original PHT could not have
been used to implement a distributed phone book
without the changes presented in this article. The
novelty of our work is twofold:
The original PHT proposes to handle mul-
tiple keywords using Locality-Preserving
Hashing, as in Squid. In our application,
we simply concatenate the keywords ac-
cording to their priority, and pad the result
with random data.
1. The original PHT is a binary tree enabling
Bit-wise processing of keywords. Its design
does not support caching, and it handles
multiple keywords using a Squid-like ap-
proach. This does not match the requirements
found in the distributed phone book scenario.
Therefore, we extended the PHT in several
respects, as described below.
2. The EPHT algorithm has several configu-
ration parameters, like the number of child
nodes, and the maximum load of a node. We
evaluated the Extended PHT with real-world
phone book data, and showed how to gain
the best performance.
EVALUATION
In this section, we present the simulation results.
The evaluation data is taken from a German
phone book CDROM from 1997, because newer
electronic phone books restrict data export due to
privacy regulations. We used the entries for the
city of Munich, which has 620,853 entries. As each
peer is supposed to provide only its own entry, the
number of peers is equal to the number of entries.
Data Load
In the rest of this section, we will show the
major differences between the EPHT and the
PHT algorithm.
The number of entries per peer is one of our key
performance indicators, as well-balanced data are
the prerequisite for good balancing of the network
Search WWH ::




Custom Search