Databases Reference
In-Depth Information
Pig: Pig Latin is a Hadoop-based language developed by Yahoo.
It is relatively easy to learn and is adept at very deep, very long
data pipelines (a limitation of SQL.) Pig, originally developed at
Yahoo research, is a high-level language for building map-reduce
programs for Hadoop, thus simplifying the use of map-reduce. It
is a data flow language that provides high-level commands.
HBase: HBase is a non-relational database that allows for
low-latency, quick lookups in Hadoop. It adds transactional
capabilities to Hadoop, allowing users to conduct updates,
inserts, and deletes. E-Bay and Facebook use HBase heavily.
Flume: Flume is a framework for populating Hadoop with
data. Agents are populated throughout ones' IT infrastructure
(inside web servers, application servers, and mobile devices, for
example) to collect data and integrate it into Hadoop.
Oozie: Oozie is a workflow processing system that lets users
define a series of jobs written in multiple languages (such as map-
reduce, Pig and Hive) then intelligently links them to one another.
Oozie allows users to specify, for example, that a particular query
is only to be initiated after specified previous jobs on which it
relies for data are completed.
Whirr: Whirr is a set of libraries that allows users to easily spin-up
Hadoop clusters on top of Amazon EC2, Rackspace, or any virtual
infrastructure. It supports all major virtualized infrastructure
vendors on the market.
Avro: Avro is a data serialization system that allows for encoding
the schema of Hadoop files. It is adept at parsing data and
performing removed procedure calls.
Mahout: Mahout is a data-mining library. It takes the most
popular data-mining algorithms for performing clustering,
regression testing, and statistical modeling and implements them
using the map-reduce model.
Sqoop: Sqoop is a connectivity tool for moving data from
non-Hadoop data stores such as relational databases and data
warehouses into Hadoop. It allows users to specify the target
location inside of Hadoop and instruct Sqoop to move data from
Oracle, Teradata, or other relational databases to the target.
BigTop: BigTop is an effort to create a more formal process or
framework for packaging and interoperability testing of Hadoop's
sub-projects and related components with the goal improving the
Hadoop platform as a whole.
Clearly, native Hadoop is not a database by any stretch of the imagination. However,
once it became popular, it was inevitable that Hadoop would soon evolve to adopt some
of the characteristics of a database. HBase, another open source project, stepped in to
 
Search WWH ::




Custom Search