Databases Reference
In-Depth Information
Pros:
Scalable design and modular data integration architecture.
Heterogeneous physical architecture deployment, providing best-in-class integration at the
data processing layer.
Custom configured to suit the processing rigors as required for each organization.
Cons:
Customized configuration is the biggest weakness.
Data integration and query scalability can become complex as the configuration changes over
a period of time.
This architecture can be deployed to process all types of Big Data, and is the closest to a scalable
and integrated next-generation data warehouse platform.
Pitfalls to avoid:
Custom configuration can be maintenance-heavy.
Executing large data exchanges between the different layers can cause performance issues.
Too much dependency on any one transformation layer creates scalability bottlenecks.
Data security implementation with LDAP integration should be avoided for the unstructured
layers.
Data virtualization
Data virtualization technology can be used to create the next-generation data warehouse platform. As
shown in Figure 10.9 , the biggest benefit of this deployment is the reuse of existing infrastructure for
the structured portion of the data warehouse. This approach also provides an opportunity to distribute
workload effectively across the platforms thereby allowing for the best optimization to be executed in
the architectures. Data Virtualization coupled with a strong semantic architecture can create a scalable
solution.
Pros:
Extremely scalable and flexible architecture.
Workload optimized.
Easy to maintain.
Lower initial cost of deployment.
Cons:
Lack of governance can create too many silos and degrade performance.
Complex query processing can become degraded over a period of time.
Performance at the integration layer may need periodic maintenance.
Data loading is isolated across the layers. This provides a foundation to create a robust data
management strategy.
Data availability is controlled to each layer and security rules can be implemented to each layer as
required, avoiding any associated overhead for other layers.
Data volumes can be managed across the individual layers of data based on the data type, the
life-cycle requirements for the data, and the cost of the storage.
Storage performance is based on the data categories and the performance requirements, and the
storage tiers can be configured.
 
Search WWH ::




Custom Search