Database Reference
In-Depth Information
Yet another benefit of vnodes is faster repair. Node repair requires the creation of a
Merkle tree (we will see this later in this chapter) for each range of data that a node holds.
The data gets compared with the data on the replica nodes, and if needed, data re-sync is
done. Creation of a Merkle tree involves iterating through all the data in the range fol-
lowed by streaming it. For a large range, the creation of a Merkle tree can be very time
consuming while the data transfer might be much faster. With vnodes, the ranges are
smaller, which means faster data validation (by comparing with other nodes). Since the
Merkle tree creation process is broken into many smaller steps (as there are many small
nodes that exist in a physical node), the data transmission does not have to wait till the
whole big range finishes. Also, the validation uses all other machines instead of just a
couple of replica nodes.
Tip
As of Cassandra 2.0.9, the default setting for vnodes is "on" with default vnodes per ma-
chine as 256. If for some reason you do not want to use vnodes and want to disable this
feature, comment out the num_tokens variable and uncomment and set the ini-
tial_token variable in cassandra.yaml . If you are starting with a new cluster or
migrating an old cluster to the latest version of Cassandra, vnodes are highly recommen-
ded.
The number of vnodes that you specify on a Cassandra node represents the number of
vnodes on that machine. So, the total vnodes on a cluster is the sum total of all the vnodes
across all the nodes. One can always imagine a Cassandra cluster as a ring of lots of
vnodes.
Search WWH ::




Custom Search