Databases Reference
In-Depth Information
Abstracted layers of hierarchy—the most complex area in Big Data processing are the hidden
layers of hierarchy. Data contained in textual, semi-structured, image and video, and converted
documents from audio conversations all have context, and without appropriate contextualization
the associated hierarchy cannot be processed. Incorrect hierarchy attribution will result in data sets
that may not be relevant.
Lack of metadata—there is no metadata within the documents or files containing Big Data.
While this is not unusual, it poses challenges when attributing the metadata to the data during
processing. The use of taxonomies and semantic libraries will be useful in flagging the data and
subsequently processing it.
Processing limitations
There are a couple of processing limitations for processing Big Data:
Write-once model—with Big Data there is no update processing logic due to the intrinsic nature
of the data that is being processed. Data with changes will be processed as new data.
Data fracturing—due to the intrinsic storage design, data can be fractured across the
Big Data infrastructure. Processing logic needs to understand the appropriate metadata
schema used in loading the data. If this match is missed, then errors could creep into processing
the data.
Big Data processing can have combinations of these limitations and complexities, which will need
to be accommodated in the processing of the data. The next section discusses the steps in processing
Big Data.
Processing Big Data
Big Data processing involves steps very similar to processing data in the transactional or data ware-
house environments. Figure 11.5 shows the different stages involved in the processing of Big Data;
the approach to processing Big Data is:
Gather the data.
Analyze the data.
Process the data.
Distribute the data.
While the stages are similar to traditional data processing the key differences are:
Data is first analyzed and then processed.
Data standardization occurs in the analyze stage, which forms the foundation for the distribute
stage where the data warehouse integration happens.
There is not special emphasis on data quality except the use of metadata, master data, and
semantic libraries to enhance and enrich the data.
Data is prepared in the analyze stage for further processing and integration.
The stages and their activities are described in the following sections in detail, including the use
of metadata, master data, and governance processes.
Search WWH ::




Custom Search