Database Reference
In-Depth Information
The commit logs are periodically removed, following a successful flush of all their appended data
to the dedicated datafiles. For this reason, the commit logs will not grow to anywhere near the
size of the datafiles, so the disks don't need to be as large; this is something to consider during
hardware selection. For example, if Cassandra runs a flush, you'll see something in the server
logs like this:
INFO 18:26:11,497 Enqueuing flush of Memtable-LocationInfo@26830618(52 bytes, 2
operations)
INFO 18:26:11,497 Writing Memtable-LocationInfo@26830618(52 bytes, 2 operations)
INFO 18:26:11,732 Completed flushing /var/lib/cassandra/data/system/
LocationInfo-2-Data.db
INFO 18:26:11,732 Discarding obsolete commit log:
CommitLogSegment(/var/lib/cassandra/commitlog/CommitLog-1278894011530.log)
Then, if you check the commit log directory, that file has been deleted.
By default, the commit log and the datafile are stored in the following locations:
<CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory>
<DataFileDirectories>
<DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory>
</DataFileDirectories>
You can change these values to store the datafiles or commit log in different locations. You can
specify multiple datafile directories if you wish.
NOTE
You don't need to update these values for Windows, even if you leave them in the default location, be-
cause Windows will automatically adjust the path separator and place them under C:\. Of course, in a
real environment, it's a good idea to specify them separately, as indicated.
For testing, you might not see a need to change these locations. However, it's recommended that
you store the datafiles and the commit logs on separate hard disks for maximum performance.
Cassandra, like many databases, is particularly dependent on the speed of the hard disk and the
speed of the CPUs (it's best to have four or eight cores, to take advantage of Cassandra's highly
concurrent construction). So make sure for QA and production environments to get the fastest
disks you can, and get at least two separate ones so that the commit logfiles and the datafiles are
not competing for I/O time. It's more important to have several processing cores than one or two
very fast ones.
Search WWH ::




Custom Search