Data-Driven Architecture for Big Data - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

FIGURE 11.5

Processing Big Data.

Gather stage

Data is acquired from multiple sources including real-time systems, near-real-time systems, and

batch-oriented applications. The data is collected and loaded to a storage environment like Hadoop or

NoSQL. Another option is to process the data through a knowledge discovery platform and store the

output rather than the whole data set.

Analysis stage

The analysis stage is the data discovery stage for processing Big Data and preparing it for integra-

tion to the structured analytical platforms or the data warehouse. The analysis stage consists of tag-

ging, classification, and categorization of data, which closely resembles the subject area creation data

model definition stage in the data warehouse.

●

Tagging —a common practice that has been prevalent since 2003 on the Internet for data sharing.

Tagging is the process of applying a term to an unstructured piece of information that will provide

a metadata-like attribution to the data. Tagging creates a rich nonhierarchical data set that can be

used to process the data downstream in the process stage.

●

Classify —unstructured data comes from multiple sources and is stored in the gathering process.

Classification helps to group data into subject-oriented data sets for ease of processing. For

example, classifying all customer data in one group helps optimize the processing of unstructured

customer data.

●

Categorize —the process of categorization is the external organization of data from a storage

perspective where the data is physically grouped by both the classification and then the data type.

Categorization will be useful in managing the life cycle of the data since the data is stored as a

write-once model in the storage layer.

Search WWH ::

Custom Search

Home