Data-Driven Architecture for Big Data - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

●

Abstracted layers of hierarchy—the most complex area in Big Data processing are the hidden

layers of hierarchy. Data contained in textual, semi-structured, image and video, and converted

documents from audio conversations all have context, and without appropriate contextualization

the associated hierarchy cannot be processed. Incorrect hierarchy attribution will result in data sets

that may not be relevant.

●

Lack of metadata—there is no metadata within the documents or files containing Big Data.

While this is not unusual, it poses challenges when attributing the metadata to the data during

processing. The use of taxonomies and semantic libraries will be useful in flagging the data and

subsequently processing it.

Processing limitations

There are a couple of processing limitations for processing Big Data:

●

Write-once model—with Big Data there is no update processing logic due to the intrinsic nature

of the data that is being processed. Data with changes will be processed as new data.

●

Data fracturing—due to the intrinsic storage design, data can be fractured across the

Big Data infrastructure. Processing logic needs to understand the appropriate metadata

schema used in loading the data. If this match is missed, then errors could creep into processing

the data.

Big Data processing can have combinations of these limitations and complexities, which will need

to be accommodated in the processing of the data. The next section discusses the steps in processing

Big Data.

Processing Big Data

Big Data processing involves steps very similar to processing data in the transactional or data ware-

house environments. Figure 11.5 shows the different stages involved in the processing of Big Data;

the approach to processing Big Data is:

●

Gather the data.

●

Analyze the data.

●

Process the data.

●

Distribute the data.

While the stages are similar to traditional data processing the key differences are:

●

Data is first analyzed and then processed.

●

Data standardization occurs in the analyze stage, which forms the foundation for the distribute

stage where the data warehouse integration happens.

●

There is not special emphasis on data quality except the use of metadata, master data, and

semantic libraries to enhance and enrich the data.

●

Data is prepared in the analyze stage for further processing and integration.

The stages and their activities are described in the following sections in detail, including the use

of metadata, master data, and governance processes.

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home