New Technologies Applied to Data Warehousing - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

From a business intelligence and data warehouse perspective, the applicability of cloud computing

can be extended to reports, analytics, and dashboards. The other areas are still in nascent stages of adop-

tion to cloud computing. Big Data can be easily built on a cloud footprint and deployed by organiza-

tions, as most data types that will be classified as Big Data will not be an organizational asset, and rather

will be a purchased data set from a third-party source; this includes social data, sensor data, and others.

Though cloud computing is in a mature state of evolution from an application perspective, there

are still concerns on security and availability from a data warehouse perspective. In the next five years

these issues will be nonissues and the cloud will be a vibrant platform in most organizations.

Another emerging technology for next-generation data warehouses is the data virtualization platform.

Data virtualization

Another approach to solving the data integration challenge while leveraging all the investments on the

current infrastructure is by deploying data virtualization to create a semantic data integration architec-

ture. While the concept of data virtualization itself is not new, it has evolved over the years from enterprise

application integration (EAI) to enterprise information integration (EII) to service oriented architecture

(SOA), with the difference being that as EAI and SOA platforms, the integration aspects were focused on

applications and middleware, while the current engineering efforts have been focused on data and analytics.

The key features of data virtualization that make it an attractive proposition include:

●

Data presentation as a single colocated layer to the consumer.

●

Data formatting to a uniform presentation layer via semantic transformations.

●

Infrastructure abstraction.

●

Pushdown performance optimization based on the source platform.

●

Data as a service (DaaS) implementation, providing extreme scalability.

What is data virtualization?

Figure 9.4 shows the current data and analytics architecture and landscape that exist in many organ-

izations. Due to the intrinsic limitations of the database technology and its underutilization of the

technology stack, we often build too manly silos of solutions and cannot scale up or integrate data in

many situations. One technique that was tried to overcome these limitations was to create a data fed-

eration architecture. The shortcoming of the federated approach was the inability to scale the infra-

structure when linking multiple instances of a database or datamarts. In the current-state architecture

we lose a lot of business value and opportunities that can be harnessed from the current data. In addi-

tion, due to the nonstandard implementation of metadata across the layers, we cannot easily integrate

any new data or Big Data without significant rework.

Figure 9.5 shows a potential future state based on implementing a data virtualization platform.

In this architecture, with the data integration architecture shifting to a data virtualization layer, there

are a few key platform features that can be leveraged for creating business value and scaling business

intelligence:

●

Automated data discovery. Data virtualization can perform an automated data discovery exercise

across the current data sources and new data sources and integrate the outputs in a metadata

repository.

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home