Application Architectures for Big Data and Analytics - Big Data Imperatives

Databases Reference

In-Depth Information

• Pig: Pig Latin is a Hadoop-based language developed by Yahoo.

It is relatively easy to learn and is adept at very deep, very long

data pipelines (a limitation of SQL.) Pig, originally developed at

Yahoo research, is a high-level language for building map-reduce

programs for Hadoop, thus simplifying the use of map-reduce. It

is a data flow language that provides high-level commands.

• HBase: HBase is a non-relational database that allows for

low-latency, quick lookups in Hadoop. It adds transactional

capabilities to Hadoop, allowing users to conduct updates,

inserts, and deletes. E-Bay and Facebook use HBase heavily.

• Flume: Flume is a framework for populating Hadoop with

data. Agents are populated throughout ones' IT infrastructure

(inside web servers, application servers, and mobile devices, for

example) to collect data and integrate it into Hadoop.

• Oozie: Oozie is a workflow processing system that lets users

define a series of jobs written in multiple languages (such as map-

reduce, Pig and Hive) then intelligently links them to one another.

Oozie allows users to specify, for example, that a particular query

is only to be initiated after specified previous jobs on which it

relies for data are completed.

• Whirr: Whirr is a set of libraries that allows users to easily spin-up

Hadoop clusters on top of Amazon EC2, Rackspace, or any virtual

infrastructure. It supports all major virtualized infrastructure

vendors on the market.

• Avro: Avro is a data serialization system that allows for encoding

the schema of Hadoop files. It is adept at parsing data and

performing removed procedure calls.

• Mahout: Mahout is a data-mining library. It takes the most

popular data-mining algorithms for performing clustering,

regression testing, and statistical modeling and implements them

using the map-reduce model.

• Sqoop: Sqoop is a connectivity tool for moving data from

non-Hadoop data stores such as relational databases and data

warehouses into Hadoop. It allows users to specify the target

location inside of Hadoop and instruct Sqoop to move data from

Oracle, Teradata, or other relational databases to the target.

• BigTop: BigTop is an effort to create a more formal process or

framework for packaging and interoperability testing of Hadoop's

sub-projects and related components with the goal improving the

Hadoop platform as a whole.

Clearly, native Hadoop is not a database by any stretch of the imagination. However,

once it became popular, it was inevitable that Hadoop would soon evolve to adopt some

of the characteristics of a database. HBase, another open source project, stepped in to

Big Data Imperatives

Search WWH ::

Custom Search

Home