Databases Reference
In-Depth Information
weight to the already voluminous data warehouse from a workload perspective, causing overwhelm-
ing workloads and underperforming systems. Distributing the workload does not improve scalabil-
ity and reduce workload, as anyone would anticipate since each distribution comes with a limited
scalability.
New workloads and Big Data
Big Data brings about a new definition to the world of workloads. Apart from traditional challenges
that exist in the world of data, the volume, velocity, variety, complexity, and ambiguous nature of Big
Data creates a new class of challenges and issues. The key set of challenges and issues that we need
to understand regarding data in the Big Data world include:
Data does not have a finite architecture and can have multiple formats.
Data is self-contained and needs several external business rules to be created to interpret and
process the data.
Data has a minimal or zero concept of referential integrity.
Data is not relational.
Data needs more analytical processing.
Data depends on metadata for creating context.
Data has no specificity with volume or complexity.
Data is semi-structured or unstructured.
Data needs multiple cycles of processing, but each cycle needs to be processed in one pass due to
the size of the data.
Data needs business rules for processing like we handle structured data today, but these rules need
to be created in a rules engine architecture rather than the database or the ETL tool.
Data needs more governance than data in the database.
Data has no defined quality.
Big Data workloads
Workload management as it pertains to Big Data is completely different from traditional data and its
management. The major areas where workload definitions are important to understand for design and
processing efficiency include:
Data is file based for acquisition and storage—whether you choose Hadoop, NoSQL, or any
other technique, most of the Big Data is file based. The underlying reason for choosing file-based
management is the ease of management of files, replication, and ability to store any format of data
for processing.
Data processing will happen in three steps:
1. Discovery —in this step the data is analyzed and categorized.
2. Analysis —in this step the data is associated with master data and metadata.
3. Analytics —in this step the data is converted to metrics and structured.
 
Search WWH ::




Custom Search