Database Reference
In-Depth Information
cassandra.yaml. If you have a large heap size and your use case is having Cas-
sandra under a write-heavy workload, this value can be safely increased.
On a similar note, the
memtable_flush_queue_size
has an effect on
the speed and efficiency of a MemTable flush. This value determines the number
of MemTables to allow to wait for a writer thread to flush to disk. This should
be set, at a minimum, to the maximum number of secondary indexes created on a
single ColumnFamily. In other words, if you have a ColumnFamily with ten sec-
ondary indexes, this value should be 10. The reason for this is that if the
memt-
able_flush_queue_size
is set to 2 and you have three secondary indexes
on a ColumnFamily, there are two options available: either flush the MemTable
and each index not updated during the initial flush will be out of sync with the
SSTable data, or the new SSTable won't be loaded into memory until all the in-
dexes have been updated. To avoid either of these scenarios, it is recommended
that the value of
mem_table_flush_queue_size
be set to ensure that all
secondary indexes for a ColumnFamily can be updated at flush time.
Concurrency
Cassandra was designed to have faster write performance than read performance.
One of the ways this is achieved is using threads for concurrency. There are two
settings in the cassandra.yaml that allow control over the amount of concurrency:
concurrent_reads
and
concurrent_writes
. By default, both of these
values are set to 32. Since the writes that come to Cassandra go to memory,
the bottleneck for most performance is going to the disk for reads. To calcu-
late the best setting for
concurrent_reads
, the value should be 16 * $num-
ber_of_drives. If you have two drives, you can leave the default value of 32. The
reason for this is to ensure that the operating system has enough room to decide
which reads should be serviced at what time.
Coming back to writes, since they go directly to memory, the work is CPU
bound as opposed to being I/O bound like reads. To derive the best setting for
concurrent_writes
, you should use 8 * number_of_cores in your system. If
you have a quad-core dual-processor system, the total number of cores is eight and
the
concurrent_writes
should be set at 64.