Big Data Computing Applications - Guide to Cloud Computing for Business and Technology Managers

Information Technology Reference

In-Depth Information

21.3 Additional Details on Big Data Technologies

21.3.1 Processing Approach

Current big data computing platforms use a divide and conquer parallel pro-

cessing approach combining multiple processors and disks in large com-

puting clusters connected using high-speed communications switches and

networks that allows the data to be partitioned among the available com-

puting resources and processed independently to achieve performance and

scalability based on the amount of data (Figure 5.1). We define a cluster as

“a type of parallel and distributed system, which consists of a collection of

inter-connected stand-alone computers working together as a single inte-

grated computing resource.”

This approach to parallel processing is often referred to as a shared-nothing

approach since each node consisting of processor, local memory, and disk

resources shares nothing with other nodes in the cluster. In parallel comput-

ing, this approach is considered suitable for data processing problems that

are embarrassingly parallel , that is, where it is relatively easy to separate the

problem into a number of parallel tasks and there is no dependency or com-

munication required between the tasks other than overall management of

the tasks. These types of data processing problems are inherently adaptable

to various forms of distributed computing including clusters and data grids

and cloud computing.

21.3.2 Big Data System Architecture

A variety of system architectures have been implemented for big data

and large-scale data analysis applications including parallel and distrib-

uted relational database management systems that have been available

to run on shared-nothing clusters of processing nodes for more than two

decades. These include database systems from Teradata, Netezza, Vertica,

and Exadata/Oracle, and others, which provide high-performance parallel

database platforms. Although these systems have the ability to run paral-

lel applications and queries expressed in the SQL, they are typically not

general-purpose processing platforms and usually run as a back-end to a

separate front-end application processing system.

Although this approach offers benefits when the data utilized are primar-

ily structured in nature and fits easily into the constraints of a relational

database, and often excels for transaction processing applications, most data

growth is with data in unstructured form and new processing paradigms

with more flexible data models were needed. Internet companies such as

Google, Yahoo, Microsoft, Facebook, and others required a new process-

ing approach to effectively deal with the enormous amount of Web data

for applications such as search engines and social networking. In addition,

Search WWH ::

Custom Search

Home