Introducing Big Data Technologies - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

limitations that exist in traditional RDBMS platforms. In the new architecture the RDBMS,

DBMS, and NoSQL technologies have a role, and can be deployed as needed to solve that

requirement.

●

Distributed file-based storage —data is stored in files, which is cheaper compared to storing on a

database. Additionally, data is distributed across systems, providing built-in redundancy.

●

Linearly scalable infrastructure —every piece of infrastructure added will create 100% scalability

from the CPU to storage and memory.

●

Programmable APIs —all modules of data processing will be driven by procedural programming

application programming interfaces (APIs), which allow for parallel processing without the

limitations imposed by concurrency. The same data can be processed across systems for different

purposes or the same logic can be processed across different systems. There are different case

studies on these techniques.

●

High-speed replication —data is able to replicate at high speeds across the network.

●

High availability —data and the infrastructure are always available and accessible by the users.

●

Localized processing of data and storage of results —the ability to process and store results

locally, meaning compute and store occur in the same disk within the storage architecture.

This means one needs to store replicated copies of data across disks to accomplish localized

processing.

●

Fault tolerance —with extreme replication and distributed processing, system failures could be

rebalanced with relative ease, as mandated by web users and applications.

With the features and capabilities discussed here, the limitations of distributed data processing with

relational databases are not a real barrier anymore. The new-generation architecture has created a scal-

able and extensible data processing environment for web applications and has been adopted widely by

companies that use web platforms. Over the last decade many of these technologies have been commit-

ted back to the open-source community for further development by innovators across the world (see the

Apache Foundation website at for committers across projects). The new-generation data processing plat-

forms, including Hadoop, Hive, HBase, Cassandra, MongoDB, CouchDB, REDIS, Neo4J, DynamoDB,

and more, are all products of these architectural pursuits, and are discussed in this chapter.

There is a continuum of technology development in this direction (by the time we are finished with

this topic, there will be newer developments that can be found on the companion website for this topic

( http://booksite.elsevier.com/9780124058910 )).

Big Data processing requirements

What is unique about Big Data processing? What makes it different or mandates new thinking? To

understand this better let us look at the underlying requirements. We can classify Big Data require-

ments based on its five main characteristics:

● Volume:

●

Size of data to be processed is large—it needs to be broken into manageable chunks.

●

Data needs to be processed in parallel across multiple systems.

●

Data needs to be processed across several program modules simultaneously.

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home