Databases Reference
In-Depth Information
Table 5-2. RDBMS and Hadoop characteristics
Relational DBMSs
Map-Reduce/Hadoop
Mostly proprietary
Open Source
Expensive, Total Cost of Ownership
(TCO) grows exponentially
Less expensive, Total Cost of Ownership
(TCO) is linear
Data Structures are rigid and needs to be
modeled prior
Flexible data structure, less to no modeling
required
Great for speedy indexed lookups
Great for massive full data scans
Rich relational semantics
Indirect support for relational semantics,
ex: Hive
Indirect support for complex data
structures
Deep support for complex data structures
Indirect support for complex algorithms,
iterations and branching operations
Deep support for iterations, branching
operations and complex algorithms
Deep support for transaction processing
Little to no support for transaction processing
There are several components within the Hadoop environment performing data
management operations; below is a listing of their functionality and roles they play,
grouped under data management functions as relevant to a data warehouse scenario.
Hadoop Distributed File System: HDFS, the storage layer of
Hadoop, is a distributed, scalable, Java-based file system adept at
storing large volumes of unstructured data.
MapReduce: MapReduce is a software framework that serves
as the compute layer of Hadoop. MapReduce jobs are divided
into two parts. The map function divides a query into multiple
parts and processes data at the node level. The reduce function
aggregates the results of the map function to determine the
answer to the query.
Hive: Hive is a Hadoop-based data warehouse developed by
Facebook. It allows users to write queries in SQL, which are then
converted to map-reduce. This allows SQL programmers with no
map-reduce experience to use the warehouse and makes it easier
to integrate with business intelligence and visualization tools
such as Micro Strategy, Tableau, Revolutions Analytics, etc.
Hive, initially a sub-project of Hadoop, evolved to provide a formal query capability.
In effect, Hive turns Hadoop into something like a data warehouse system, allowing data
summarization, ad hoc queries, and the analysis of data stored by Hadoop. Hive holds
metadata describing the contents of files and allows queries in HiveQL, an SQL-like
language. It also allows map-reduce programmers to get around the limitations of HiveQL
by plugging in map-reduce routines.
 
 
Search WWH ::




Custom Search