Database Reference
In-Depth Information
cached, Cassandra checks the SSTables on disk. If the row isn't in the SSTables,
the MemTables are checked. Wherever the row was found, whether in the SST-
ables or the MemTables, the row cache will be populated with the value returned
for that row. If the application requests a column and the row where that column
resides is already in the row cache, Cassandra will grab the row from the cache,
pull the column out, and return it to the application.
General Caching Tips
Here are some general rules that you can follow to get efficient use of caching:
Store less recently used data or wide rows in a ColumnFamily with min-
imal or no caching.
Try to create logical separations of heavily read data. This can be done by
breaking your frequently used data apart into many ColumnFamilys and
tuning the caching on each ColumnFamily individually.
Add more Cassandra nodes. Since Cassandra does a lot of caching for
you, you will get a pretty solid benefit from adding nodes. Each node will
then contain a smaller data set, and you will be able to fit more data in
memory.
The other item to be aware of when it comes to caching is how it affects the
MemTables. A Cassandra MemTable requires an index structure in addition to the
data that it stores. This is so that the MemTable is easily searchable for data that
has not been written to an SSTable yet. If the size of the values stored is small
compared to the number of rows and columns in that MemTable, the overhead to
support this indexing may not be worth it.
Global Cache Tuning
There is an easy performance gain to be had if you have a small data set per node.
The first thing we need to do is define small in this scenario. Small refers to the
data set being able to fit into memory. In the cassandra.yaml file, the setting pop-
ulate_io_cache_on_flush is set to false by default. This is because it is
expected that most data sets will not be able to fit into memory. If yours does, it
means the cache will be populated on MemTable flush and compactions. This will
greatly speed up your query times by having all data loaded into the cache imme-
diately when it becomes available.
One of the most common types of caches for a database is a key cache. Cas-
sandra gives you the ability to control the maximum size of the key cache. By de-
Search WWH ::




Custom Search