Workload Management in the Data Warehouse - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

●

Each of these steps bring a workload characteristic:

●

Discovery will mandate interrogation of data by users. The data will need to be processed

where it is and not moved across the network. The reason for this is due to the size and

complexity of the data itself, and this requirement is a design goal for Big Data architecture.

Compute and process data at the storage layer.

●

Analysis will mandate parsing of data with data visualization tools. This will require minimal

transformation and movement of data across the network.

●

Analytics will require converting the data to a structured format and extracting for processing

to the data warehouse or analytical engines.

●

Big Data workloads are drastically different from the traditional workloads due to the fact that no

database is involved in the processing of Big Data. This removes a large scalability constraint but

adds more complexity to maintain file system-driven consistency. Another key factor to remember

is there is no transaction processing but rather data processing involved with processing Big Data.

These factors are the design considerations when building a Big Data system, which we will

discuss in Chapters 10 and 11.

●

Big Data workloads from an analytical perspective will be very similar to adding new data to the

data warehouse. The key difference here is the tables that will be added are of the narrow/narrow

type, but the impact on the analytical model can be that of a wide/narrow table that will become

wide/wide.

●

Big Data query workloads are more program execution of MapReduce code, which is completely

opposite of executing SQL and optimizing for SQL performance.

The major difference in Big Data workload management is the impact of tuning the data pro-

cessing bottlenecks results in linear scalability and instant outcomes, as opposed to the traditional

RDBMS world of data management. This is due to the file-based processing of data, the self-con-

tained nature of the data, and the maturity of the algorithms on the infrastructure itself.

Technology choices

As we look back and think about how to design the next generation of data warehouses with the

concept of a workload-driven architecture, there are several technologies that have come into being

in the last decade, and these technologies are critical to consider for the new architecture. A key

aspect to remember is the concept of data warehousing is not changing but the deployment and the

architecture of the data warehouse will evolve from being tightly coupled into the database and

its infrastructure to being distributed across different layers of infrastructure and data architecture.

The goal of building the workload-driven architecture is to leverage all the technology improve-

ments into the flexibility and scalability of the data warehouse and Big Data processing, thereby

creating a coexistence platform leveraging all current-state and future-state investments to better

ROI. Another viewpoint to think about is that by design Big Data processing is built around proce-

dural processing (more akin to programming language-driven processing), which can take advan-

tage of multicore CPU and SSD or DRAM technologies to the fullest extent, as opposed to the

RDBMS architecture where large cycles of processing and memory are left underutilized.

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home