Integration of Big Data and Data Warehousing - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

configuration can provide the administrators with tools and tips to zone the infrastructure to

mark the data in its own area, minimizing both risk and performance impact.

●

Data exploration and mining is a very common activity that is a driver for Big Data acquisition

across organizations, and also produces large data sets as output of processing. These data sets

need to be maintained in the Big Data system by periodically sweeping and deleting intermediate

data sets. This is an area that normally is ignored by organizations and can be a performance drain

over a period of time.

Storage performance

●

Disk performance is an important consideration when building Big Data systems and the

appliance model can provide a better focus on the storage class and tiering architecture. This

will provide the starting kit for longer-term planning and growth management of the storage

infrastructure.

●

If a combination of in-memory, SSD, and traditional storage architecture is planned for Big Data

processing, the persistence and exchange of data across the different layers can be consuming both

processing time and cycles. Care needs to be extended in this area, and the appliance architecture

provides a reference for such complex storage requirements.

Operational costs

Calculating the operational cost for a data warehouse and its Big Data platform is a complex task that

includes initial acquisition costs for infrastructure, plus labor costs for implementing the architecture,

plus infrastructure and labor costs for ongoing maintenance including external help commissioned

from consultants and experts.

External data integration

Figure 10.6 shows the external data integration approach to creating the next-generation data ware-

house. In this approach the existing data processing and data warehouse platforms are retained, and a

new platform for processing Big Data is created in new technology architecture. A data bus is devel-

oped using metadata and semantic technologies, which will create a data integration environment for

data exploration and processing.

Workload processing is clearly divided in this architecture into processing Big Data in its infrastruc-

ture and the current-state data warehouse in its infrastructure. The streamlining of workload helps main-

tain performance and data quality, but the complexity increases in the data bus architecture, which can

be a simple layer or an overwhelmingly complex layer of processing. This is a custom-built solution for

each system that will be integrated into the data warehouse architecture, and needs a lot of data archi-

tecture skills and maintenance. Data processing of Big Data will be outside the RDBMS platform, and

provides opportunities to create unlimited scalability at a lower price point.

● Pros:

●

Scalable design for RDBMS and Big Data processing.

●

Reduced overload on processing.

●

Complexity of processing can be isolated across data acquisition, data cleansing, data

discovery, and data integration.

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home