Databases Reference
In-Depth Information
Big Data Warehouse System Requirements
and Hybrid Architectures
The system requirements for big data solutions are completely different from traditional
data management and analytics solutions. Big data solutions will require a technology or
a combination of technologies capable of:
Managing scale and wide variety of data types covering both the
scenarios “data at rest” and “data in motion”
Managing distributed data across thousands of processors; in
many situations the data clusters and grids may be geographically
distributed
Integrating any data source whose structure is not previously
known (being schema-read ready)
Ability to manage and execute workflows that can work across
distributed hundreds and thousands of nodes
Ability to provide built-in semantics to handle and manage trade-
offs between consistency, availability and high partition-tolerance
functionality
Ability to support extreme mixed workloads like depth queries as
well as breadth queries ranging from ad hoc queries to strategic
analysis, and while loading data in batch and streaming fashion
If these are the requirements for big data solutions, do we have any such application
architecture that can address all of these requirements? Generally speaking there are two
types of application architecture approaches to implement big data solutions: extended
RDBMS Architectures extending traditional EDW architectures to manage volume of data
and hybrid architectures employing map-reduce/Hadoop architectures to provide a data
platform that can manage scale and variety of data types.
A current view of product enhancements of almost all of the major relational
database management system vendors outlines an interesting pattern, most of the
RDBMS products have significantly evolved adding features like massively parallel
processing (MPP) abilities, columnar storage, in-database analytics and ability to execute
hadoop map-reduce technologies in the database itself.
This raises a set of interesting questions. How will big data impact your EDW and
BI investments? Will it replace them? If not, then how would you combine these two
technologies within your current data management architectures?
The intent of these two technologies is different, and their strengths complement
each other providing a holistic data platform for enterprises to leverage. The BDW can be
used as a data ingestion platform to acquire any type of data of interest at reasonable cost,
with little upfront data processing, and less data modeling and data cleansing overheads.
The EDW can then utilize these data sources to further enrich the already existing facts
and dimensions to support reporting and analytics activities.
 
Search WWH ::




Custom Search