Database Reference
In-Depth Information
Partition tolerance implies if a node or couple of nodes is down, the
system would still be able to serve read/write requests. In scalable
systems, built to deal with a massive volume of data (in peta bytes) it
is highly likely that situations may occur often. Hence, such systems
have to be partition tolerant. Cassandra's storage architecture enables
this as well.
Consistency means consistent across distributed nodes. Strong con-
sistency refers to most updated or consistent data on each node in a
cluster. On each read/write request most stable rows can be read or
written to by introducing latency (downside of NoSQL) on each read
and write request, ensuring synchronized data on all the replicas. Cas-
sandra offers eventual consistency, and levels of configuration con-
sistency for each read/write request. We will discuss various consist-
ency level options in detail in the coming chapters.
Budding Schema
Structured or fixed schema defines the number of columns and data types before imple-
mentation. Any alteration to schema like adding column(s) would require a migration
plan across the schema. For semistructured or unstructured data formats where number
of columns and data types may vary across multiple rows, static schema doesn't fit
very well. That's where budding or dynamic schema is best fit for semistructured or
unstructured data.
Figure 1-2 presents four records containing twitter-like data for a particular user id.
Here, the user id imvivek consists of three columns “tweet body”, ”followers”, and
“retweeted by”. But on the row for user “apress_team” there is only the column follow-
ers. For unstructured schema such as server logs, the number of fields may vary from
row to row. This requires the addition of columns “on the fly” a strong requirement for
NoSQL databases. Traditional RDBMS can handle such data set in a static way, but un-
like Cassandra RDBMS cannot scale to have up to a million columns per row in each
partition. With predefined models in the RDBMS world, handling frequent schema
changes is certainly not a workable option. Imagine if we attempt to support dynamic
columns we may end up having many null columns! Having default null values for
multiple columns per row is certainly not desirable. With Cassandra we can have as
many columns as we want (up to 2 billion)! Also another possible option is to define
datatype for column names (comparator) which is not possible with RDBMS (to have a
column name of type integer).
 
Search WWH ::




Custom Search