Database Reference
In-Depth Information
To achieve high availability, Stormy uses a successor list replication mechanism,
as proposed in Chord [21], where every query is replicated on several nodes, and a
replication protocol takes care that every incoming event is executed by every rep-
lica. To cope with overload situations, Stormy uses two main techniques:
1. Load balancing : The decision to balance load is made locally by a node, based
on its current utilization. Each node continuously measures its resource utili-
zation of CPU, memory, network consumption, and disk space. These param-
eters form a utilization factor, which is disseminated via the gossip protocol to
all other nodes in the system. In regular intervals, a node compares its utiliza-
tion factor to those of its immediate neighbors. If the load difference is above
a specified threshold, the node initiates a load-balancing step; that is, the node
transfers part of its data range to its less loaded neighbor. As every node makes
the decision to balance load locally, several load-balancing steps can happen in
parallel, which allows the whole system to react efficiently to load variations.
2. Cloud bursting : If the overall load of the system is getting too high, a new
node has to be added to the system. The decision to cloud-burst is made by
only one node in the system, the elected cloud bursting leader. This leader
initiates the cloud bursting procedure, which brings up and adds a new node
to the system. This new node takes a random position on the ring, takes over
parts of the data range from its neighbors, and finally updates the DHT. If
the overall load of the system is too low, the cloud bursting leader decides to
drop a node from the system. Similar to adding a new node, we first move
the queries and their state to the node that takes over the range, update the
DHT, and terminate the old node.
Stormy has been implemented on top of the Cloudy [15] system that offers a DHT
to distribute and replicate events across the nodes of the system and therefore already
provides scalability and elasticity. On top of Cloudy, MXQuery stream processing
engine* is used to execute any XQuery query with additional streaming constructs.
Apache Zookeeper is used to ensure consistent leader election.
12.8 TWITTER STORM
The Storm system has been presented by Twitter as a distributed and fault-tolerant
stream processing system that instantiates the fundamental principles of Actor the-
ory. The key design principles of Storm are
Horizontally scalable : Computations and data processing are performed in
parallel using multiple threads, processes, and machines.
Guaranteed message processing : The system guarantees that each message
will be fully processed at least once. The system takes care of replaying
messages from the source when a task fails.
* http://mxquery.org/.
http://zookeeper.apache.org/.
http://github.com/nathanmarz/storm.
Search WWH ::




Custom Search