Information Technology Reference
In-Depth Information
With the development of distributed technologies, Distributed Hash Table (DHT)
is widely employed to provide systems with scalability and self-organizing. In a
DHT-based publish/subscribe system, a node can play three roles simultaneously,
including subscriber, publisher and broker. As a subscriber, it registers its interest on
some brokers by means of subscriptions, and receives events that satisfy its interest
from brokers. As a publisher, it sends events that contain certain information to
brokers, regardless of the final receivers. And as a broker, it matches received events
with subscriptions and delivers them to the matched subscribers.
In a DHT-based publish/subscribe system, a broker is also called a rendezvous
point (RP). Generally speaking, only when an event and a subscription meet on a RP
can they match. As a result, the policy for subscription installation and event publi-
cation is of critical importance. Up to now, several proposals have been put forward
for DHT and content based systems. Gupta [3] and Li [8] create Cartesian spaces
whose dimensions are decided by the number of attributes in the system. Each RP
is responsible for the matching within a sub-space, and subscriptions and events are
installed according to value ranges or values of their attributes. Complexities of such
systems increase dramatically with the growth of the number of attributes in the sys-
tem. The proposals of Terpstra [4] and Silvia [5] apply filter-based routing, and they
combine the matching process with event publication. However, organization and
maintenance of subscriptions are always complicated for filter-based routing strat-
egy. Ferry [6] and Eferry [7] manage subscriptions and events according to names of
attributes contained in them. Ferry assigns a subscription to a certain RP by hashing
the name of an attribute contained in the subscription, and each event is published
to all RPs for matching. Eferry improves Ferry to maintain a suitable quantity of
RP nodes in the system and keep an even load distribution among them, but it suf-
fers from high event publication cost. PEM [9] and Vinifera [10] also install their
subscriptions according to names of attributes.
The loads of RPs and the corresponding load balancing performance are always
key issues in DHT-based publish/subscribe systems. In fact, the load of a RP mainly
comprises two parts: overhead for matching and overhead for message transmission.
The former is affected by three factors: the number of subscriptions stored on the RP,
the number of events the RP receives and the matching complexity. And the latter is
impacted by the number of subscriptions and events exist in the system. However,
none of the previous proposals evaluated the loads of RPs comprehensively.
In this paper, we propose a data mining based publish/subscribe system (DMPSS).
In DMPSS, multiple factors are taken into consideration to optimize loads of RPs.
We use data mining technology to find attributes that are usually subscribed together,
e.g. frequent itemset. Then RPs are divided into two categories. One category is re-
sponsible for matching subscriptions and events that contain frequent itemsets, and
the other category is responsible for matching subscriptions and events that don't
contain frequent itemsets. Subscriptions and events are installed by frequent item-
sets contained in them. The advantages of DMPSS are as follows:
1. It can be applied on any overlay networks easily.
2. It achieves even matching load distribution on RPs. In DMPSS, RPs related to
frequent itemsets store larger number of subscriptions than RPs that are not related
Search WWH ::




Custom Search