Databases Reference
In-Depth Information
node held a portion of the total data, a technique referred to as sharding , for breaking the
database up into shards. The queries are broken into sub-queries, which are then applied
to specific nodes in a server cluster. The results from each one of these sub-queries are then
aggregated to get the final answer. All resources are exploited to run in parallel. To improve
performance or cater to larger data volumes, more nodes are added to the cluster as and
when needed.
Most NoSQL databases have a scale-out architecture and can be distributed across
many server nodes. How they handle data distribution, data compression, and node
failure varies from product to product, but the general architecture is similar. They are
usually built in a shared-nothing manner so that no node has to know much about what's
happening on other nodes.
The scale-out architecture brings to light two interesting features, and both of these
features focus on the ability to distribute data over a cluster of servers.
Replication: This is all about taking the same data and copying it over multiple
nodes. There are two types of replication strategies.
Master-Slave
In Master-Slave approach, you replicate data across multiple nodes. One node acts
as the designated master and the rest are slave nodes keeping copies of the entire data
sets, thereby providing resilience to node failures. The master node is the most updated
and accurate source for the data sets and is responsible for managing consistency.
Periodically, the slaves synchronize their content with the master.
Master-Slave replication is most helpful for scaling when you have a read-intensive
data set. You can scale horizontally to handle more read requests by adding more slave
nodes and ensuring all read requests are routed to the slaves. However, this approach will
have a major bottleneck when you have workloads that are read- and write-intensive, the
master will have to juggle around updates and pass on those updates to the slave nodes to
make the data consistent everywhere!
While the Master-Slave approach provides read scalability, it severely lacks in write
scalability. Peer-to-Peer replication approach addresses this issue by not having a master
node altogether. All replication nodes have equal weight, they all accept write requests,
and the loss of any of the nodes doesn't prevent access to the data store because rest of
the nodes are accessible and have the copies of the same data, although it may not be the
most updated data.
In this approach, the concerning fact is about data consistency across all the nodes:
when you perform write operations on two different nodes on the same data set, you run
into the risk of two different users attempting to update the same record at the same time
thus introducing a write-write conflict. This sort of write-write conflicts are managed
through a concept called “serialization” wherein, you decide to apply the write operations
one after another. Serialization is applied either as pessimistic or optimistic mode.
Pessimistic works by preventing conflicts from occurring, in a sense, all write operations
are performed in a sequential manner, when all are done, and then only the data set is
made available. Optimistic works by letting conflicts occur but detects the instances of
conflict and later takes corrective actions to sort them out, making all the write operations
eventually consistent.
Peer-To-Peer
 
Search WWH ::




Custom Search