Introducing Big Data Technologies - Data Warehousing in the Age of Big Data - page 91

Databases Reference

In-Depth Information

Table 4.1 Column Family Parameters

Parameter

Default Value

column_type

Standard

compaction_strategy

SizeTieredCompactionStrategy

comparator

BytesType

compare_subcolumns_with

BytesType

dc_local_read_repair_chance

0

gc_grace_seconds

864000 (10 days)

keys_cached

200000

max_compaction_threshold

32

min_compaction_threshold

4

0.1 or 1 (see description below)

read_repair_chance

replicate_on_write

TRUE

0 (disabled by default)

rows_cached

Data partitioning

Data partitioning can be done either by the client library or by any node of the cluster and can be

calculated using different algorithms. There are two native algorithms that are provided with

Cassandra:

●

Random-Partitionner —a hash-based distribution, where the keys are more equally partitioned

across the different nodes, providing better load balancing. In this partitioning, each row and all

the columns associated with the row key are stored on the same physical node and columns are

sorted based on their name.

●

OrderPreserving-Partitioner —creates partitions based on the key and data grouped by keys,

which will boost performance of range queries since the query will need to hit lesser number of

nodes to get all the ranges of data.

Data sorting

When defining a column, you can specify how the columns will be sorted when results are returned to

the client. Columns are sorted by the “compare with” type defined on their enclosing column family.

You can specify a custom sort order; the default provided options are:

●

BytesType—simple sort by byte value; no validation is performed.

●

AsciiType—similar to BytesType, but validates that the input can be parsed as US-ASCII.

●

UTF8Type—a string encoded as UTF8.

●

LongType— 64 bits long.

●

LexicalUUIDType—a 128-bit, compared lexically (by byte value).

●

TimeUUIDType—a 128-bit version 1 University Unique Identifiers (UUID), compared by

timestamp.

●

Integer—faster than a log, and supports fewer or longer lengths.

Next Page

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home