Databases Reference
In-Depth Information
Flume
Used for collecting and aggregating large amounts of log/event data on HDFS and
deployed as a service
Fuse-DFS
Enables integration with other systems for data import and export by allowing
mounting of HDFS volumes using the Linux FUSE filesystem
HBase
A columnar database with support of data summaries and ad hoc queries
Hive
A SQL-like language, metadata repository, and data warehousing framework with
a rudimentary rules-based optimizer for Hadoop
Mahout
A machine learning and data mining programming library
Oozie
A workflow engine and job scheduler for Hadoop
Pig
A high level dataflow programming language and compiler for producing and ex‐
ecuting MapReduce programs
Sqoop
A tool used in transferring data between Hadoop and relational databases that uses
MapReduce for import and/or export and supports direct data import into Hive
tables
Zookeeper
The coordination service for distributed applications running on Hadoop
Though MapReduce-like functionality is supported in Oracle Database 12 c through
pattern matching (as we will note later), most organizations will likely continue to deploy
separate optimized Hadoop clusters when analyzing such data. The data warehouse
greatly complements the Hadoop cluster platform as the relational database serves as a
destination for the Big Data of value and provides a standard SQL interface for querying
all data. In addition, the data warehouse is generally deployed for higher availability and
better recovery than a Hadoop cluster, and provides higher levels of security than pos‐
sible with Hadoop today. We will describe later in this chapter how the data warehousing
topology often evolves when Big Data is included.
Data Warehouse Design
The database serves as the foundation of the business intelligence infrastructure: it is
the place where the data is stored. But there is more to business intelligence than data
Search WWH ::




Custom Search