Introducing Big Data Technologies - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

● Data needs to be processed once and processed to completion due to volumes.

● Data needs to be processed from any point of failure, since it is extremely large to restart the

process from the beginning.

● Velocity:

● Data needs to be processed at streaming speeds during data collection.

● Data needs to be processed for multiple acquisition points.

● Variety:

● Data of different formats needs to be processed.

● Data of different types needs to be processed.

● Data of different structures needs to be processed.

● Data from different regions needs to be processed.

● Ambiguity:

● Big Data is ambiguous by nature due to the lack of relevant metadata and context in many

cases. An example is the use of M and F in a sentence—it can mean, respectively, Monday and

Friday, male and female, or mother and father.

● Big Data that is within the corporation also exhibits this ambiguity to a lesser degree. For

example, employment agreements have standard and custom sections and the latter is

ambiguous without the right context.

● Complexity:

●

Big Data complexity needs to use many algorithms to process data quickly and efficiently.

●

Several types of data need multipass processing and scalability is extremely important.

Processing large-scale data requires an extremely high-performance computing environment that

can be managed with the greatest ease and can performance tune with linear scalability.

Technologies for Big Data processing

There are several technologies that have come and gone in the data processing world, from the main-

frames, to two-tier databases, to virtual storage access method (VSAM) files. Several programming

languages have evolved to solve the puzzle of high-speed data processing and have either stayed

niche or never found adoption. After the initial hype and bust of the Internet bubble, there came a

moment in the history of data processing that caused an unrest in the industry—the scalability of the

Internet search. Technology startups like Google, RankDex (now known as Baidu), and Yahoo, and

open-source projects like Nutch, were all figuring out how to increase the performance of the search

query to scale infinitely. Out of these efforts came the technologies that are now the foundation of

Big Data processing. The focus of this section is to discuss the evolution and implementation of these

technologies around

●

Data movement

●

Data storage

●

Data management

Before we discuss the technology and architecture of Big Data platforms, let us take a few min-

utes to discuss one of the most powerful and game-changing technology innovations that revolution-

ized the landscape for Big Data platforms—the Google file system.

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home