Monitoring - Mastering Apache Cassandra

Database Reference

In-Depth Information

Like decommission , the move command streams data off the node. These may create

a lot of network traffic and affect performance temporarily. These tasks should be done

during the time the application is relatively free. Refer to Chapter 6 , Managing a Cluster -

Scaling, Node Repair, and Backup , for more information on the move command.

repair

The repair command performs pretty useful maintenance tasks. It helps the cluster

avoid returning unwanted data from the dead node (the re-appearance of deleted data). It

is highly recommended to run repair periodically on the whole cluster to avoid forgotten

deletes. The time period between two consecutive repairs should be less than the value as-

signed to gc_grace_seconds (configured per table, the default is 10 days).

Hinted handoff is useful only as long as there is no hardware failure that lasts more than

the value assigned to max_hint_window_in_ms . Therefore, it is generally a good

idea to set up a cron job for your production machines that executes nodetool repair for

all the nodes.

The way forgotten deletes come back can be explained with an example. Let's say you

have two nodes A and B with the data X replicated between them. If you issue a delete ac-

tion with CL.ONE when node B is down, the client will get success, and hinted handoff

will make a note to resend the request to B when it comes back. If, unfortunately, node B

does not come back before hinted handoff is cleared, gc_grace_seconds wipes the

data from node A. Now, if node B comes back to life, Cassandra will treat the deleted row

as a new row that is not replicated to node A (it will be copied to node A during read re-

pair). Running a node repair does not fix the problem, now that gc_grace_seconds

has been exceeded.

The nodetool repair command has the following format:

nodetool -h HOSTNAME repair [Keyspace] [cfnames] [-pr]

Therefore, repair can be executed for a node, a given keyspace, or a list of tables.

There is an interesting option called primary range , denoted by -pr . The primary range

option just repairs the range that the node owns. Without the primary range option,

however, the command forces Cassandra to repair the node as well as all the replicas.

Thus, if you are planning to repair an entire cluster, you should use the -pr switch.

Otherwise, you are duplicating the task RF times. The following are the scenarios for the

repair command:

Search WWH ::

Custom Search

Home