Information Technology Reference
In-Depth Information
Figure 5. Generating a 32 char identifier for an entry
Squid, the hot spots in terms of data load could be
avoided, but as a trade-off this results in a large
number of peers to be queried to find an entry.
In the following section, we introduce the
Extended Prefix Hash Tree as a way of enabling
efficient range queries, while preserving the ad-
vantages of the Random Oracle Model for hashing,
which results in a balanced distribution of the
entries among the peers in the DHT.
to filter out the right results when a user searched
for “Müller”.
The identifier length must be sufficient to en-
sure that a unique identifier can be built for each
entry with high probability. In our evaluation, the
identifiers were 32 characters long. Identifiers that
are longer than that are truncated; identifiers that
are shorter are padded with random characters.
Growing the Tree
EXTENDED PREFIX HASH
TREE ALGORITHM
The structure of an EPHT is shown in Figure
6. There are two parameters that determine the
shape of the tree:
Each entry in the distributed phone book is
associated with an identifier. The identifier is
a fixed-length string, consisting of the capital
characters [A-Z]. Identifiers are built by concat-
enating keywords from the entry. In the example
in Figure 5, we used the keywords last name, first
name, and city.
The order of the keywords determines the
relevance of these keywords for range queries. If
identifiers are built as in Figure 5, it is possible
to search for the last name without knowing the
city, but it is not possible to search for the city
without knowing the last name. This corresponds
to the hierarchical structure of printed phone
books, where entries are ordered by city, last name,
first name, etc. In order to allow alternative key-
word orders, the application must maintain sev-
eral trees in parallel.
Special characters like whitespaces or the Ger-
man ä, ö, ü, ß are omitted. That way, both German
names “Müller” and “Möller” map into the same
string “MLLER”. It is up to the application layer
1. n is the number of children per node. Each
edge is labeled with a character set, like
[S-Z]. The partitioning of the alphabet into
character sets is fixed and globally known,
and cannot be changed dynamically during
runtime. n is the number of character sets,
which can be any number between 2 and 26.
In the section on evaluation, we show that
the best performance is achieved with n =26.
2. m is the maximum load of the root node,
i.e. the maximum number of entries that
can be stored on the root node. If the root
node exceeds its maximum load, it splits up
into n child nodes and distributes all entries
among the children. The maximum load of
each child equals the maximum load of the
parent node plus one. The reason for incre-
menting the maximum load is to prevent
recursive splits, if all entries happen to be
stored on the same child. If a child node's
prefix length (see below for the definition
Search WWH ::




Custom Search