Data Modeling Approaches for Big Data and Analytics Solutions - Big Data Imperatives

Databases Reference

In-Depth Information

Table 6-2. Relational Data Model vs. Cassandra Data Model

Relational Data Model

Cassandra Data Model

(Standard)

Cassandra Data Model

(Super)

Server based

Cluster based

Database

Key Space

Table

Column Family

Primary Key

Key

Column Name

Super Column Name

Column Value

Designing Cassandra Data Structures

1.

Entities and Point of Interest: The best way to model a

Cassandra data structure is to identify the entities that would

be subjected to most queries and creating the entire structure

around the entity. The activities performed (generally the

use cases) by the user applications, how the data is retrieved

and displayed would be the areas of interest for designing the

Cassandra column families.

2.

De-normalization: Normalization is the set of rules

established to aid in the design of tables and their

relationships in any RDBMS. The benefits of normalization

would be:

•

Avoiding repetitive entries

•

Reduction of storage space

•

Prevention of schema restructuring for future needs.

•

Improved speed and flexibility of SQL queries, joins, sorts,

and search results.

Achieving similar kind of performance for big data scale is a challenge in traditional

relational data models. Therefore, in most of the big appl data ications de-normalization

approaches are adopted to achieve performance. Cassandra does not support foreign

key relationships like a relational database, and the better way is to de-normalize the

data model. The important fact is that instead of modeling the data first and framing the

queries, with Cassandra the queries would be modeled first and then the data be framed

around them.

Big Data Imperatives

Search WWH ::

Custom Search

Home