A Data Mining Based Publish/Subscribe System over Structured Peer-to-Peer Networks - Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

Information Technology Reference

In-Depth Information

With the development of distributed technologies, Distributed Hash Table (DHT)

is widely employed to provide systems with scalability and self-organizing. In a

DHT-based publish/subscribe system, a node can play three roles simultaneously,

including subscriber, publisher and broker. As a subscriber, it registers its interest on

some brokers by means of subscriptions, and receives events that satisfy its interest

from brokers. As a publisher, it sends events that contain certain information to

brokers, regardless of the final receivers. And as a broker, it matches received events

with subscriptions and delivers them to the matched subscribers.

In a DHT-based publish/subscribe system, a broker is also called a rendezvous

point (RP). Generally speaking, only when an event and a subscription meet on a RP

can they match. As a result, the policy for subscription installation and event publi-

cation is of critical importance. Up to now, several proposals have been put forward

for DHT and content based systems. Gupta [3] and Li [8] create Cartesian spaces

whose dimensions are decided by the number of attributes in the system. Each RP

is responsible for the matching within a sub-space, and subscriptions and events are

installed according to value ranges or values of their attributes. Complexities of such

systems increase dramatically with the growth of the number of attributes in the sys-

tem. The proposals of Terpstra [4] and Silvia [5] apply filter-based routing, and they

combine the matching process with event publication. However, organization and

maintenance of subscriptions are always complicated for filter-based routing strat-

egy. Ferry [6] and Eferry [7] manage subscriptions and events according to names of

attributes contained in them. Ferry assigns a subscription to a certain RP by hashing

the name of an attribute contained in the subscription, and each event is published

to all RPs for matching. Eferry improves Ferry to maintain a suitable quantity of

RP nodes in the system and keep an even load distribution among them, but it suf-

fers from high event publication cost. PEM [9] and Vinifera [10] also install their

subscriptions according to names of attributes.

The loads of RPs and the corresponding load balancing performance are always

key issues in DHT-based publish/subscribe systems. In fact, the load of a RP mainly

comprises two parts: overhead for matching and overhead for message transmission.

The former is affected by three factors: the number of subscriptions stored on the RP,

the number of events the RP receives and the matching complexity. And the latter is

impacted by the number of subscriptions and events exist in the system. However,

none of the previous proposals evaluated the loads of RPs comprehensively.

In this paper, we propose a data mining based publish/subscribe system (DMPSS).

In DMPSS, multiple factors are taken into consideration to optimize loads of RPs.

We use data mining technology to find attributes that are usually subscribed together,

e.g. frequent itemset. Then RPs are divided into two categories. One category is re-

sponsible for matching subscriptions and events that contain frequent itemsets, and

the other category is responsible for matching subscriptions and events that don't

contain frequent itemsets. Subscriptions and events are installed by frequent item-

sets contained in them. The advantages of DMPSS are as follows:

1. It can be applied on any overlay networks easily.

2. It achieves even matching load distribution on RPs. In DMPSS, RPs related to

frequent itemsets store larger number of subscriptions than RPs that are not related

Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

Search WWH ::

Custom Search

Home