Workload Management in the Data Warehouse - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

weight to the already voluminous data warehouse from a workload perspective, causing overwhelm-

ing workloads and underperforming systems. Distributing the workload does not improve scalabil-

ity and reduce workload, as anyone would anticipate since each distribution comes with a limited

scalability.

New workloads and Big Data

Big Data brings about a new definition to the world of workloads. Apart from traditional challenges

that exist in the world of data, the volume, velocity, variety, complexity, and ambiguous nature of Big

Data creates a new class of challenges and issues. The key set of challenges and issues that we need

to understand regarding data in the Big Data world include:

●

Data does not have a finite architecture and can have multiple formats.

●

Data is self-contained and needs several external business rules to be created to interpret and

process the data.

●

Data has a minimal or zero concept of referential integrity.

●

Data is not relational.

●

Data needs more analytical processing.

●

Data depends on metadata for creating context.

●

Data has no specificity with volume or complexity.

●

Data is semi-structured or unstructured.

●

Data needs multiple cycles of processing, but each cycle needs to be processed in one pass due to

the size of the data.

●

Data needs business rules for processing like we handle structured data today, but these rules need

to be created in a rules engine architecture rather than the database or the ETL tool.

●

Data needs more governance than data in the database.

●

Data has no defined quality.

Big Data workloads

Workload management as it pertains to Big Data is completely different from traditional data and its

management. The major areas where workload definitions are important to understand for design and

processing efficiency include:

●

Data is file based for acquisition and storage—whether you choose Hadoop, NoSQL, or any

other technique, most of the Big Data is file based. The underlying reason for choosing file-based

management is the ease of management of files, replication, and ability to store any format of data

for processing.

●

Data processing will happen in three steps:

1. Discovery —in this step the data is analyzed and categorized.

2. Analysis —in this step the data is associated with master data and metadata.

3. Analytics —in this step the data is converted to metrics and structured.

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home