Integration of Big Data and Data Warehousing - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

● Pros:

● Scalable design and modular data integration architecture.

● Heterogeneous physical architecture deployment, providing best-in-class integration at the

data processing layer.

● Custom configured to suit the processing rigors as required for each organization.

● Cons:

●

Customized configuration is the biggest weakness.

●

Data integration and query scalability can become complex as the configuration changes over

a period of time.

This architecture can be deployed to process all types of Big Data, and is the closest to a scalable

and integrated next-generation data warehouse platform.

●

Pitfalls to avoid:

●

Custom configuration can be maintenance-heavy.

●

Executing large data exchanges between the different layers can cause performance issues.

●

Too much dependency on any one transformation layer creates scalability bottlenecks.

●

Data security implementation with LDAP integration should be avoided for the unstructured

layers.

Data virtualization

Data virtualization technology can be used to create the next-generation data warehouse platform. As

shown in Figure 10.9 , the biggest benefit of this deployment is the reuse of existing infrastructure for

the structured portion of the data warehouse. This approach also provides an opportunity to distribute

workload effectively across the platforms thereby allowing for the best optimization to be executed in

the architectures. Data Virtualization coupled with a strong semantic architecture can create a scalable

solution.

● Pros:

● Extremely scalable and flexible architecture.

● Workload optimized.

● Easy to maintain.

● Lower initial cost of deployment.

● Cons:

●

Lack of governance can create too many silos and degrade performance.

●

Complex query processing can become degraded over a period of time.

●

Performance at the integration layer may need periodic maintenance.

●

Data loading is isolated across the layers. This provides a foundation to create a robust data

management strategy.

●

Data availability is controlled to each layer and security rules can be implemented to each layer as

required, avoiding any associated overhead for other layers.

●

Data volumes can be managed across the individual layers of data based on the data type, the

life-cycle requirements for the data, and the cost of the storage.

●

Storage performance is based on the data categories and the performance requirements, and the

storage tiers can be configured.

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home