Database Reference
In-Depth Information
Timeouts
There are quite a few configurable timeouts in Cassandra. The proper values for
these settings are highly dependent on your environment and your system require-
ments. They include how long the coordinator node in a query should wait for op-
erations to return. Setting the proper timeouts for your environment is critical. If
you set the values too high, your queries will start to stack up while coordinator
nodes wait for responses from slow or down nodes. If the settings are too low, co-
ordinator nodes will give responses based on incomplete information and the rep-
lica sets will have been queried for data that wasn't returned to the application.
Another configurable value is streaming_socket_timeout_in_ms .
This is an important setting as it can control how much time is spent restreaming
data between nodes in the event of a timeout. By default, there is no timeout in
Cassandra for streaming operations. It is a good idea to set a timeout, but not too
low a timeout. If a streaming operation times out, the file being streamed is started
over from the beginning. As some SSTables can have a not insignificant amount
of data, ensure that the value is set high enough to avoid unnecessary streaming
restarts.
Cassandra provides a setting that allows nodes to communicate timeout inform-
ation to each other. This option is called cross_node_timeout and defaults
to false . The reason this is initially off is because the timing can properly be
synchronized only if system clocks on all nodes are in sync. This is usually accom-
plished with an NTP (Network Time Protocol) server. If this setting is disabled,
Cassandra assumes that the request was instantly forwarded by a coordinator node
to the replica.
CommitLog
The idea of a CommitLog and how Cassandra has implemented it is one of the
reasons that Cassandra responds so well to write-heavy workloads. Here are some
tricks for optimizing the CommitLog.
An easy optimization for Cassandra is putting your CommitLog directory on
a separate drive from your data directories. CommitLog segments are written to
every time a MemTable is flushed to disk. This might be easier said than done
depending on your setup. If your servers are hosted in AWS, the instance stores
are your best bet for CommitLog segments on standard machines. On the hi1.large
instances in AWS, which allow you to use solid-state drives (SSDs), you have ac-
cess to multiple faster devices than just the ephemeral drives. But the idea is that
Search WWH ::




Custom Search