Information Technology Reference
In-Depth Information
TABLE 21.2
Value of Big Data across Industries
Volume
of Data
Velocity
of Data
Variety
of Data
Underutilized
Data ( Dark Data )
Big Data Value
Potential
Banking and
securities
High
High
Low
Medium
High
Communications
and media
services
High
High
High
Medium
High
Education
Very low
Very low
Very low
High
Medium
Government
High
Medium
High
High
High
Health-care
providers
Medium
High
Medium
Medium
High
Insurance
Medium
Medium
Medium
Medium
Medium
Manufacturing
High
High
High
High
High
Chemicals and
natural
resources
High
High
High
High
Medium
Retail
High
High
High
Low
High
Transportation
Medium
Medium
Medium
High
Medium
Utilities
Medium
Medium
Medium
Medium
Medium
1. Principle of colocation of the data and programs or algorithms to perform
the computation : To achieve high performance in big data computing,
it is important to minimize the movement of data. This principle—
Move the code to the data —which was designed into the data-parallel
processing architecture implemented by Seisint in 2003, is extremely
effective since program size is usually small in comparison to the
large data sets processed by big data systems and results in much
less network traffic since data can be read locally instead of across the
network. In direct contrast to other types of computing and super-
computing that utilize data stored in a separate repository or servers
and transfer the data to the processing system for computation, big
data computing uses distributed data and distributed file systems
in which data are located across a cluster of processing nodes and,
instead of moving the data, the program or algorithm is transferred
to the nodes with the data that need to be processed. This character-
istic allows processing algorithms to execute on the nodes where the
data reside reducing system overhead and increasing performance.
2. Programming model utilized : Big data computing systems utilize a
machine-independent approach in which applications are expressed
in terms of high-level operations on data and the runtime system
transparently controls the scheduling, execution, load balancing, com-
munications, and movement of programs and data across the distrib-
uted computing cluster. The programming abstraction and language
 
Search WWH ::




Custom Search