Application Architectures for Big Data and Analytics - Big Data Imperatives

Databases Reference

In-Depth Information

Table 5-2. RDBMS and Hadoop characteristics

Relational DBMSs

Map-Reduce/Hadoop

Mostly proprietary

Open Source

Expensive, Total Cost of Ownership

(TCO) grows exponentially

Less expensive, Total Cost of Ownership

(TCO) is linear

Data Structures are rigid and needs to be

modeled prior

Flexible data structure, less to no modeling

required

Great for speedy indexed lookups

Great for massive full data scans

Rich relational semantics

Indirect support for relational semantics,

ex: Hive

Indirect support for complex data

structures

Deep support for complex data structures

Indirect support for complex algorithms,

iterations and branching operations

Deep support for iterations, branching

operations and complex algorithms

Deep support for transaction processing

Little to no support for transaction processing

There are several components within the Hadoop environment performing data

management operations; below is a listing of their functionality and roles they play,

grouped under data management functions as relevant to a data warehouse scenario.

• Hadoop Distributed File System: HDFS, the storage layer of

Hadoop, is a distributed, scalable, Java-based file system adept at

storing large volumes of unstructured data.

• MapReduce: MapReduce is a software framework that serves

as the compute layer of Hadoop. MapReduce jobs are divided

into two parts. The map function divides a query into multiple

parts and processes data at the node level. The reduce function

aggregates the results of the map function to determine the

answer to the query.

• Hive: Hive is a Hadoop-based data warehouse developed by

Facebook. It allows users to write queries in SQL, which are then

converted to map-reduce. This allows SQL programmers with no

map-reduce experience to use the warehouse and makes it easier

to integrate with business intelligence and visualization tools

such as Micro Strategy, Tableau, Revolutions Analytics, etc.

Hive, initially a sub-project of Hadoop, evolved to provide a formal query capability.

In effect, Hive turns Hadoop into something like a data warehouse system, allowing data

summarization, ad hoc queries, and the analysis of data stored by Hadoop. Hive holds

metadata describing the contents of files and allows queries in HiveQL, an SQL-like

language. It also allows map-reduce programmers to get around the limitations of HiveQL

by plugging in map-reduce routines.

Big Data Imperatives

Search WWH ::

Custom Search

Home