Database Reference
In-Depth Information
Consider an example in which a viewer transmission analytic system is capturing
random logs for each transmitted program and watched or watching users. The first
question we need to ask is, is it really a big data problem? Yes, here we are talking
about logs; imagine in a country like India the user base is huge as are the logs cap-
tured 24x7! Also, the nature of transmitted logs may be random, meaning the structure
is not fixed! It can be semi-structured or totally unstructured. That's where RDBMS
will fail to deliver because of budding schema and scalability problems (see previous
section).
To summarize, build a NoSQL based solution if:
Data format is semi/unstructured
RDBMS reaches the storage limit and cannot scale further
RDBMS specific features like relations, indexes can be sacrificed
against denormalized but distributed data
Data redundancy is not an issue and a read-before-write approach can be
applied
In the next section, we will discuss how Cassandra can be a best fit to address such
technical and functional challenges.
Introducing Cassandra
Cassandra is an open-source column, family-oriented database. Originally developed at
Facebook, it has been an Apache TLP since 2009. Cassandra comes with many import-
ant features; some are listed below:
Distributed database\
Peer to Peer architecture
Configurable consistency
CQL (Cassandra Query Language)
Distributed Databases
Search WWH ::




Custom Search