Database Reference
In-Depth Information
Memtables
Each column family has a single memtable associated with it. There are a few settings around
the treatment of memtables. The size that the memtable can grow to before it is flushed
to disk as an SSTable is specified with the MemtableSizeInMB element ( bin-
ary_memtable_throughput_in_mb in YAML). Note that this value is based on the size of the
memtable itself in memory, and not heap usage, which will be larger because of the overhead
associated with column indexing.
You'll want to balance this setting with MemtableObjectCountInMillions , which sets a
threshold for the number of column values that will be stored in a memtable before it is flushed.
A related configurable setting is memtable_throughput_in_mb . This refers to the maximum
number of columns that will be stored in a single memtable before the memtable is flushed to
disk as an SSTable. The default value is 0.3, which is approximately 333,000 columns.
You can also configure how long to keep memtables in memory after they've been flushed to
disk. This value can be set with the memtable_flush_after_mins element. When the flush
is performed, it will write to a flush buffer, and you can configure the size of that buffer with
flush_data_buffer_size_in_mb .
Another element related to tuning the memtables is memtable_flush_writers . This setting,
which is 1 by default, indicates the number of threads used to write out the memtables when it
becomes necessary. If you have a very large heap, it can improve performance to set this count
higher, as these threads are blocked during disk I/O.
Concurrency
Cassandra differs from many data stores in that it offers much faster write performance than read
performance. There are two settings related to how many threads can perform read and write
operations: concurrent_reads and concurrent_writes . In general, the defaults provided by
Cassandra out of the box are very good. But you might want to update the concurrent_reads
setting immediately before you start your server. That's because the concurrent_reads setting
is optimal at two threads per processor core. By default, this setting is 8, assuming a four-core
box. If that's what you have, you're in business. If you have an eight-core box, tune it up to 16.
The concurrent_writes setting behaves somewhat differently. This should match the number
of clients that will write concurrently to the server. If Cassandra is backing a web application
server, you can tune this setting from its default of 32 to match the number of threads the applic-
ation server has available to connect to Cassandra. It is common in Java application servers such
as WebLogic to prefer database connection pools no larger than 20 or 30, but if you're using
several application servers in a cluster, you'll need to factor that in as well.
Search WWH ::




Custom Search