Performance Tuning - Practical Cassandra

Database Reference

In-Depth Information

Timeouts

There are quite a few configurable timeouts in Cassandra. The proper values for

these settings are highly dependent on your environment and your system require-

ments. They include how long the coordinator node in a query should wait for op-

erations to return. Setting the proper timeouts for your environment is critical. If

you set the values too high, your queries will start to stack up while coordinator

nodes wait for responses from slow or down nodes. If the settings are too low, co-

ordinator nodes will give responses based on incomplete information and the rep-

lica sets will have been queried for data that wasn't returned to the application.

Another configurable value is streaming_socket_timeout_in_ms .

This is an important setting as it can control how much time is spent restreaming

data between nodes in the event of a timeout. By default, there is no timeout in

Cassandra for streaming operations. It is a good idea to set a timeout, but not too

low a timeout. If a streaming operation times out, the file being streamed is started

over from the beginning. As some SSTables can have a not insignificant amount

of data, ensure that the value is set high enough to avoid unnecessary streaming

restarts.

Cassandra provides a setting that allows nodes to communicate timeout inform-

ation to each other. This option is called cross_node_timeout and defaults

to false . The reason this is initially off is because the timing can properly be

synchronized only if system clocks on all nodes are in sync. This is usually accom-

plished with an NTP (Network Time Protocol) server. If this setting is disabled,

Cassandra assumes that the request was instantly forwarded by a coordinator node

to the replica.

CommitLog

The idea of a CommitLog and how Cassandra has implemented it is one of the

reasons that Cassandra responds so well to write-heavy workloads. Here are some

tricks for optimizing the CommitLog.

An easy optimization for Cassandra is putting your CommitLog directory on

a separate drive from your data directories. CommitLog segments are written to

every time a MemTable is flushed to disk. This might be easier said than done

depending on your setup. If your servers are hosted in AWS, the instance stores

are your best bet for CommitLog segments on standard machines. On the hi1.large

instances in AWS, which allow you to use solid-state drives (SSDs), you have ac-

cess to multiple faster devices than just the ephemeral drives. But the idea is that

Search WWH ::

Custom Search

Home