Databases Reference
In-Depth Information
Data needs to be processed once and processed to completion due to volumes.
Data needs to be processed from any point of failure, since it is extremely large to restart the
process from the beginning.
Velocity:
Data needs to be processed at streaming speeds during data collection.
Data needs to be processed for multiple acquisition points.
Variety:
Data of different formats needs to be processed.
Data of different types needs to be processed.
Data of different structures needs to be processed.
Data from different regions needs to be processed.
Ambiguity:
Big Data is ambiguous by nature due to the lack of relevant metadata and context in many
cases. An example is the use of M and F in a sentence—it can mean, respectively, Monday and
Friday, male and female, or mother and father.
Big Data that is within the corporation also exhibits this ambiguity to a lesser degree. For
example, employment agreements have standard and custom sections and the latter is
ambiguous without the right context.
Complexity:
Big Data complexity needs to use many algorithms to process data quickly and efficiently.
Several types of data need multipass processing and scalability is extremely important.
Processing large-scale data requires an extremely high-performance computing environment that
can be managed with the greatest ease and can performance tune with linear scalability.
Technologies for Big Data processing
There are several technologies that have come and gone in the data processing world, from the main-
frames, to two-tier databases, to virtual storage access method (VSAM) files. Several programming
languages have evolved to solve the puzzle of high-speed data processing and have either stayed
niche or never found adoption. After the initial hype and bust of the Internet bubble, there came a
moment in the history of data processing that caused an unrest in the industry—the scalability of the
Internet search. Technology startups like Google, RankDex (now known as Baidu), and Yahoo, and
open-source projects like Nutch, were all figuring out how to increase the performance of the search
query to scale infinitely. Out of these efforts came the technologies that are now the foundation of
Big Data processing. The focus of this section is to discuss the evolution and implementation of these
technologies around
Data movement
Data storage
Data management
Before we discuss the technology and architecture of Big Data platforms, let us take a few min-
utes to discuss one of the most powerful and game-changing technology innovations that revolution-
ized the landscape for Big Data platforms—the Google file system.
 
Search WWH ::




Custom Search