Integration of Big Data and Data Warehousing - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

● Pros:

● Scalable design for RDBMS and Big Data processing.

● Modular data integration architecture.

● Heterogeneous physical architecture deployment, providing best-in-class integration at the

data processing layer.

● Metadata and MDM solutions can be leveraged with relative ease across the solution.

● Cons:

●

Performance of the Big Data connector is the biggest area of weakness.

●

Data integration and query scalability can become complex.

Typical use case for this type of integration architecture can be seen in organizations where the

data needs to be integrated into analytics and reporting. Examples include social media data, textual

data, and semi-structured data like emails.

●

Data loading is isolated across the layers. This provides a foundation to create a robust data

management strategy.

●

Data availability is controlled to each layer and security rules can be implemented to each layer as

required, avoiding any associated overhead for other layers.

●

Data volumes can be managed across the individual layers of data based on the data type, the

life-cycle requirements for the data, and the cost of the storage.

●

Storage performance—Hadoop is designed and deployed on commodity architecture and the

storage costs are very low compared to the traditional RDBMS platform. The performance of the

disks for each layer can be configured as needed by the end user.

●

Operational costs—in this architecture the operational cost calculation has fixed and variable cost

components. The variable costs are related to processing and computing infrastructure and labor

costs. The fixed costs are related to RDBMS maintenance and its related costs.

●

Pitfalls to avoid:

●

Too much data complexity at any one layer of processing.

●

Executing large data exchanges between the different layers.

●

Incorrect levels of integration (at data granularity).

●

Applying too many transformation complexities using the connectors.

Big Data appliances

Data warehouse appliances emerged as a strong black-box architecture for processing workloads spe-

cific to large-scale data in the last decade. One of the extensions of this architecture is the emergence

of Big Data appliances. These appliances are configured to handle the rigors of workloads and com-

plexities of Big Data and the current RDBMS architecture.

Figure 10.8 shows the conceptual architecture of the Big Data appliance, which includes a layer

of Hadoop and a layer of RDBMS. While the physical architectural implementation can differ among

vendors like Teradata, Oracle, IBM, and Microsoft, the underlying conceptual architecture remains

the same, where Hadoop and/or NoSQL technologies will be used to acquire, preprocess, and store

Big Data, and the RDBMS layers will be used to process the output from the Hadoop and NoSQL

layers. In-database MapReduce, R, and RDBMS specific translators and connectors will be used in

the integrated architecture for managing data movement and transformation within the appliance.

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home