Databases Reference
In-Depth Information
data volume, user demands, and total cost. A shared-nothing data warehouse architecture like a data
warehouse appliance is better suited for processing data in the data warehouse environment.
The world of Big Data processing differs completely from the transaction processing world, in
terms of data type and architecture requirements to process the data.
Big Data processing
Big Data is neither structured, nor does it have a finite state and volume. As discussed in Chapter 2,
we have seen examples of the different formats and sources of data that need to be processed as Big
Data. The processing complexities in Big Data include the following:
1. Data volume —amount of data generated every day both within and outside the organization.
Internal data includes memos, contracts, analyst reports, competitive research, financial
statements, emails, call center data, supplier data, vendor data, customer data, and confidential
and sensitive data including HR and legal.
External data includes articles, videos, blogs, analyst reviews, forums, social media, sensor
networks, and mobile data.
2. Data variety —different formats of data that are generated by different sources.
Excel spreadsheets and the associated formulas
Documents
Blogs and microblogs
Videos, images, and audio
Multilingual data
Mobile, sensor, and radio-frequency identification (RFID) data
3. Data ambiguity —complexity of the data and the ambiguity associated with it in terms of metadata
and granularity.
Comma Separated Values (CSV) files may or may not contain header rows
Word documents have multiple formats (i.e., legal documents for patients versus
pharmaceuticals by a hospital)
Sensor data from mobile versus RFID networks
Microblog data from Twitter versus data from Facebook
4. Data velocity —speed of data generation.
Sensor networks
Mobile devices
Social media
YouTube broadcasts
Streaming services such as Netflix and Hulu
Corporate documents and systems
Patient networks
Due to the very characteristics of Big Data, processing data of different types and volumes on tra-
ditional architectures like Symmetric Multi Processing (SMP) or Massive Parallel Processing (MPP)
platforms, which are more transaction prone and disk oriented, cannot provide the required scalabil-
ity, throughput, and flexibility. The biggest problem with Big Data is its uncertainty and the biggest
advantage of Big Data is its nonrelational format.
 
Search WWH ::




Custom Search