NoSQL: Cassandra Basics - Beginning Apache Cassandra Development

Database Reference

In-Depth Information

Partition tolerance implies if a node or couple of nodes is down, the

system would still be able to serve read/write requests. In scalable

systems, built to deal with a massive volume of data (in peta bytes) it

is highly likely that situations may occur often. Hence, such systems

have to be partition tolerant. Cassandra's storage architecture enables

this as well.

Consistency means consistent across distributed nodes. Strong con-

sistency refers to most updated or consistent data on each node in a

cluster. On each read/write request most stable rows can be read or

written to by introducing latency (downside of NoSQL) on each read

and write request, ensuring synchronized data on all the replicas. Cas-

sandra offers eventual consistency, and levels of configuration con-

sistency for each read/write request. We will discuss various consist-

ency level options in detail in the coming chapters.

Budding Schema

Structured or fixed schema defines the number of columns and data types before imple-

mentation. Any alteration to schema like adding column(s) would require a migration

plan across the schema. For semistructured or unstructured data formats where number

of columns and data types may vary across multiple rows, static schema doesn't fit

very well. That's where budding or dynamic schema is best fit for semistructured or

unstructured data.

Figure 1-2 presents four records containing twitter-like data for a particular user id.

Here, the user id imvivek consists of three columns “tweet body”, ”followers”, and

“retweeted by”. But on the row for user “apress_team” there is only the column follow-

ers. For unstructured schema such as server logs, the number of fields may vary from

row to row. This requires the addition of columns “on the fly” a strong requirement for

NoSQL databases. Traditional RDBMS can handle such data set in a static way, but un-

like Cassandra RDBMS cannot scale to have up to a million columns per row in each

partition. With predefined models in the RDBMS world, handling frequent schema

changes is certainly not a workable option. Imagine if we attempt to support dynamic

columns we may end up having many null columns! Having default null values for

multiple columns per row is certainly not desirable. With Cassandra we can have as

many columns as we want (up to 2 billion)! Also another possible option is to define

datatype for column names (comparator) which is not possible with RDBMS (to have a

column name of type integer).

Search WWH ::

Custom Search

Home