Indexes and Composite Columns - Beginning Apache Cassandra Development

Database Reference

In-Depth Information

This allows faster retrieval of records using binary search. Since b-tree keeps data

sorted for faster searching, it would introduce some overhead on insert, update, and de-

lete operations and would require rearranging indexes. B-tree is the preferred data

structure of a larger set of read and writes, that's why it's widely used with distributed

databases.

Clustered Indexes vs. Non-Clustered Indexes

Indexes that are maintained independently from physical rows and don't manage order-

ing of rows are called non-clustered indexes (see Figure 3-1 ). On the other hand,

clustered indexes will store actual rows in sorted order for the index field. Since a

clustered index will store and manage ordering of physical rows, only one clustered in-

dex is possible per table.

The important question is for what scenarios we should use clustered indexes and

non-clustered indexes. For example, a department can be multiple employees (many-

to-one relation) and often is required to read employee details by department. Here de-

partment is a suitable candidate for a clustered index. All rows containing employee

details would be stored and ordered by department for faster retrieval. Here employee

name is a perfect candidate for a non-clustered index and thus we can hold multiple

non-clustered indexes in a table but there will always be a single clustered index per

table.

Index Distribution

With distributed databases, data gets distributed and replicated across multiple nodes.

Retrieval of a data collection would require fetching rows from multiple nodes. Opting

for indexes over a non-row key column would also require being distributed across

multiple nodes, such as shards. Long-running queries can benefit from such shard-

based indexing for fast retrieval of data sets.

Due to peer-to-peer architecture each node in a Cassandra cluster will hold an

identical configuration. Data replication, eventual consistency, and partitioning schema

are two important aspects of data distribution.

Please refer to Chapter 1 for more details about replication factor, strategy class,

and read/write consistency.

Indexing in Cassandra

Search WWH ::

Custom Search

Home