Database Reference
In-Depth Information
Consider an example in which a viewer transmission analytic system is capturing
random logs for each transmitted program and watched or watching users. The first
question we need to ask is, is it really a big data problem? Yes, here we are talking
about logs; imagine in a country like India the user base is huge as are the logs cap-
tured 24x7! Also, the nature of transmitted logs may be random, meaning the structure
is not fixed! It can be semi-structured or totally unstructured. That's where RDBMS
will fail to deliver because of budding schema and scalability problems (see previous
section).
To summarize, build a NoSQL based solution if:
•
Data format is semi/unstructured
•
RDBMS reaches the storage limit and cannot scale further
•
RDBMS specific features like relations, indexes can be sacrificed
against denormalized but distributed data
•
Data redundancy is not an issue and a read-before-write approach can be
applied
In the next section, we will discuss how Cassandra can be a best fit to address such
technical and functional challenges.
Introducing Cassandra
Cassandra is an open-source column, family-oriented database. Originally developed at
Facebook, it has been an Apache TLP since 2009. Cassandra comes with many import-
ant features; some are listed below:
•
Distributed database\
•
Peer to Peer architecture
•
Configurable consistency
•
CQL (Cassandra Query Language)
Distributed Databases