Database Reference
In-Depth Information
Hotspots
A hotspot in a cluster is a node or a small set of nodes that show abnormally high resource
usage. In the context of Cassandra, it will be the nodes in the cluster that get abnormally
high hits or show high resource usage compared to other nodes.
A poorly balanced cluster can cause some nodes to own a high number of keys. If the re-
quest for each key has equal probability, the nodes with the higher numbers of ownership
will have to serve a high number of requests. Rebalancing the cluster may fix this issue.
Ordered partitioners, such as ByteOrderedPartitioner , usually have a hard time
making sure that each key range has an equal amount of data, unless the data coming for
each key range has the same probability. It is suggested that you rework the application to
avoid dependency on key ordering and use Murmur3Partitioner or RandomOrder-
Partitioner , unless you have a very strong reason to depend on byte-order partition-
ing. Refer to the Partitioners section in Chapter 4 , Deploying a Cluster .
High throughput-wide columns may cause a hotspot. We know that a row resides on one
server (actually, on all the replicas). If we have a row that gets written to and/or read from
at a really high rate, the node gets loaded disproportionately (and the other nodes are prob-
ably idle). A good idea is to bucket the row key. For example, assume you are a popular
website. If you decide to document a live presidential debate by recording everything told
by the candidates, host, and audiences and stream this data live, you allow users to scroll
back and forth to see the past records. In this case, if you decide to use a single row, you
are creating a hotspot. The ideal thing would be to break the row key into buckets such as
<rowKey>:<bucket_id> and apply round-robin to the buckets to store the data. Keys
are being distributed across the nodes. Now you have the load distributed on multiple ma-
chines. To fetch the data, you may want to multiget slice the buckets and merge
them into the application. The merging should be fast because the rows are already sorted.
Refer to the High throughput rows and hotspots section in Chapter 3 , Effective CQL .
Another cause of hotspots can be wrong token assignment in a multi data center setup
(refer to Chapter 4 , Deploying a Cluster ). If you have two nodes, A and B, in data center 1,
and two nodes, C and D, in data center 2, you calculate equidistant tokens and assign them
to A, B, C, and D in increasing order. It seems OK, but it actually makes node A and node
C hotspots.
Search WWH ::




Custom Search