Deploying a Cluster - Mastering Apache Cassandra

Database Reference

In-Depth Information

Hard disk capacity

A rough disk space calculation of the user that will be stored in Cassandra involves adding

up data stored in four data components on disk: commit logs, SSTable, an index file, and a

bloom filter. When the incoming data is compared with the data on the disk, you need to

take account of the database overheads associated with each type of data. The data on disk

can be about two times as large as raw data. Disk usage can be calculated using the follow-

ing code snippet:

# Size of one normal column

column_size (in bytes) = column_name_size + column_val_size

+ 15

# Size of an expiring or counter column

col_size (in bytes) = column_name_size + column_val_size + 23

# Size of a row

row_size (bytes) = size_of_all_columns + row_key_size + 23

# Primary index file size

index_size (bytes) = number_of_rows * (32 + mean_key_size)

# Addition space consumption due to replication

replication_overhead = total_data_size * (replication_factor

- 1)

Apart from this, the disk also faces high read-write during compaction. Compaction is the

process that merges SSTables to improve search efficiency. The important thing about com-

paction is that it may, in the worst case, utilize as much space as occupied by user data.

Therefore, it is a good idea to have a lot of space left. We'll discuss this again, but it de-

pends on the choice of compaction_strategy that is applied. For LeveledCom-

pactionStrategy , a balance of 10 percent is enough. On the other hand,

SizeTieredCompactionStrategy requires 50 percent free disk space in the worst

case. Here are some rules of thumb with regard to disk choice and disk operations:

• Commit logs and data files on separate disks : Commit logs are updated on each

write and are read-only for startups, which is rare. A data directory, on the other

hand, is used to flush MemTables into SSTables asynchronously. It is read through

and written on during compaction, and most importantly, it might be looked up by

Search WWH ::

Custom Search

Home