Monitoring - Mastering Apache Cassandra

Database Reference

In-Depth Information

Hotspots

A hotspot in a cluster is a node or a small set of nodes that show abnormally high resource

usage. In the context of Cassandra, it will be the nodes in the cluster that get abnormally

high hits or show high resource usage compared to other nodes.

A poorly balanced cluster can cause some nodes to own a high number of keys. If the re-

quest for each key has equal probability, the nodes with the higher numbers of ownership

will have to serve a high number of requests. Rebalancing the cluster may fix this issue.

Ordered partitioners, such as ByteOrderedPartitioner , usually have a hard time

making sure that each key range has an equal amount of data, unless the data coming for

each key range has the same probability. It is suggested that you rework the application to

avoid dependency on key ordering and use Murmur3Partitioner or RandomOrder-

Partitioner , unless you have a very strong reason to depend on byte-order partition-

ing. Refer to the Partitioners section in Chapter 4 , Deploying a Cluster .

High throughput-wide columns may cause a hotspot. We know that a row resides on one

server (actually, on all the replicas). If we have a row that gets written to and/or read from

at a really high rate, the node gets loaded disproportionately (and the other nodes are prob-

ably idle). A good idea is to bucket the row key. For example, assume you are a popular

website. If you decide to document a live presidential debate by recording everything told

by the candidates, host, and audiences and stream this data live, you allow users to scroll

back and forth to see the past records. In this case, if you decide to use a single row, you

are creating a hotspot. The ideal thing would be to break the row key into buckets such as

<rowKey>:<bucket_id> and apply round-robin to the buckets to store the data. Keys

are being distributed across the nodes. Now you have the load distributed on multiple ma-

chines. To fetch the data, you may want to multiget slice the buckets and merge

them into the application. The merging should be fast because the rows are already sorted.

Refer to the High throughput rows and hotspots section in Chapter 3 , Effective CQL .

Another cause of hotspots can be wrong token assignment in a multi data center setup

(refer to Chapter 4 , Deploying a Cluster ). If you have two nodes, A and B, in data center 1,

and two nodes, C and D, in data center 2, you calculate equidistant tokens and assign them

to A, B, C, and D in increasing order. It seems OK, but it actually makes node A and node

C hotspots.

Search WWH ::

Custom Search

Home