Performance Tuning - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

Memtables

Each column family has a single memtable associated with it. There are a few settings around

the treatment of memtables. The size that the memtable can grow to before it is flushed

to disk as an SSTable is specified with the MemtableSizeInMB element ( bin-

ary_memtable_throughput_in_mb in YAML). Note that this value is based on the size of the

memtable itself in memory, and not heap usage, which will be larger because of the overhead

associated with column indexing.

You'll want to balance this setting with MemtableObjectCountInMillions , which sets a

threshold for the number of column values that will be stored in a memtable before it is flushed.

A related configurable setting is memtable_throughput_in_mb . This refers to the maximum

number of columns that will be stored in a single memtable before the memtable is flushed to

disk as an SSTable. The default value is 0.3, which is approximately 333,000 columns.

You can also configure how long to keep memtables in memory after they've been flushed to

disk. This value can be set with the memtable_flush_after_mins element. When the flush

is performed, it will write to a flush buffer, and you can configure the size of that buffer with

flush_data_buffer_size_in_mb .

Another element related to tuning the memtables is memtable_flush_writers . This setting,

which is 1 by default, indicates the number of threads used to write out the memtables when it

becomes necessary. If you have a very large heap, it can improve performance to set this count

higher, as these threads are blocked during disk I/O.

Concurrency

Cassandra differs from many data stores in that it offers much faster write performance than read

performance. There are two settings related to how many threads can perform read and write

operations: concurrent_reads and concurrent_writes . In general, the defaults provided by

Cassandra out of the box are very good. But you might want to update the concurrent_reads

setting immediately before you start your server. That's because the concurrent_reads setting

is optimal at two threads per processor core. By default, this setting is 8, assuming a four-core

box. If that's what you have, you're in business. If you have an eight-core box, tune it up to 16.

The concurrent_writes setting behaves somewhat differently. This should match the number

of clients that will write concurrently to the server. If Cassandra is backing a web application

server, you can tune this setting from its default of 32 to match the number of threads the applic-

ation server has available to connect to Cassandra. It is common in Java application servers such

as WebLogic to prefer database connection pools no larger than 20 or 30, but if you're using

several application servers in a cluster, you'll need to factor that in as well.

Search WWH ::

Custom Search

Home