Big Data Processing Architectures - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

data volume, user demands, and total cost. A shared-nothing data warehouse architecture like a data

warehouse appliance is better suited for processing data in the data warehouse environment.

The world of Big Data processing differs completely from the transaction processing world, in

terms of data type and architecture requirements to process the data.

Big Data processing

Big Data is neither structured, nor does it have a finite state and volume. As discussed in Chapter 2,

we have seen examples of the different formats and sources of data that need to be processed as Big

Data. The processing complexities in Big Data include the following:

1. Data volume —amount of data generated every day both within and outside the organization.

● Internal data includes memos, contracts, analyst reports, competitive research, financial

statements, emails, call center data, supplier data, vendor data, customer data, and confidential

and sensitive data including HR and legal.

● External data includes articles, videos, blogs, analyst reviews, forums, social media, sensor

networks, and mobile data.

2. Data variety —different formats of data that are generated by different sources.

● Excel spreadsheets and the associated formulas

● Documents

● Blogs and microblogs

● Videos, images, and audio

● Multilingual data

● Mobile, sensor, and radio-frequency identification (RFID) data

3. Data ambiguity —complexity of the data and the ambiguity associated with it in terms of metadata

and granularity.

● Comma Separated Values (CSV) files may or may not contain header rows

● Word documents have multiple formats (i.e., legal documents for patients versus

pharmaceuticals by a hospital)

● Sensor data from mobile versus RFID networks

● Microblog data from Twitter versus data from Facebook

4. Data velocity —speed of data generation.

●

Sensor networks

●

Mobile devices

●

Social media

●

YouTube broadcasts

●

Streaming services such as Netflix and Hulu

●

Corporate documents and systems

●

Patient networks

Due to the very characteristics of Big Data, processing data of different types and volumes on tra-

ditional architectures like Symmetric Multi Processing (SMP) or Massive Parallel Processing (MPP)

platforms, which are more transaction prone and disk oriented, cannot provide the required scalabil-

ity, throughput, and flexibility. The biggest problem with Big Data is its uncertainty and the biggest

advantage of Big Data is its nonrelational format.

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home