Greenplum Unified Analytics Platform (UAP) - Getting Started with Greenplum for Big Data Analytics

Database Reference

In-Depth Information

In the next sections, we will discuss polymorphic data storage capabilities of Green-

plum that helps combine the best of the two worlds in a seamless manner.

Parallel versus distributed computing/processing

Parallel systems have been there for a while now and the new paradigm that has

gained traction in the Big Data world is distributed systems. In this section, let us ex-

plore how the parallel and distributed systems conceptually compare and contrast.

To understand parallel systems, we will use a simple taxonomy, Flynn's taxonomy

(1966). He classified parallel systems using two streams, data streams and instruc-

tion streams. The following figure is a representation of Flynn's taxonomy:

• Single Instruction Single Data ( SISD ): This is a case of a single processor

with no parallelism in data or instructions. A single instruction is executed on

single data in a sequential manner. For example, uniprocessor.

• Multiple Instruction Single Data ( MISD ): In this, multiple instructions oper-

ate on a single data stream; a typical example can be fault tolerance.

• Single Instruction Multiple Data ( SIMD ): This is a case of natural parallel-

ism; a single instruction triggers operation on multiple data streams.

• Multiple Instructions Multiple Data ( MIMD ): A case where multiple in-

dependent instructions operate on multiple and independent data streams.

Since the data streams are multiple, the memory can either be shared or dis-

tributed. Distributed processing can be categorized here. The previous figure

depicts MIMD and a variation in a distributed context.

One of the critical requirements of parallel/distributed processing systems is high

availability and fault tolerance. There are several programming paradigms to imple-

ment parallelism. The following list details the important ones:

Search WWH ::

Custom Search

Home