Database Reference
In-Depth Information
come in very handy for such needs. Before we explore each of these tools, let's go over
a brief introduction to each of these.
Apache Pig allows end programmers to write MapReduce implementations in the
form of scripts. Apache Pig simply translates this Pig script into Hadoop-compatible
MapReduce implementations. There are functions and data type support available with
Apache Pig that provide easy and reusable integration to quickly write Pig-powered
MapReduce implementations. People building data pipeline or ETL-type solutions
prefer to use Pig, as it is procedural but not declarative. Since it is not declarative, you
can create checkpoints and plug in custom code at any point of the workflow.
Apache Hive enables users to manage and analyze large data sets using SQL-like
query language. SQL has been popular and widely used across the industry. It enables
programmers to quickly adopt Hadoop and HBase big data platforms by providing a
query-like interface, namely Hive Query Language (Hive QL). Generally it is used for
ad-hoc SQL-based analytics. With Hive QL we can perform various DDL and DML
operations in an SQL manner. Data definition language (DDL) is used for performing
tasks like creating and altering tables, and data manipulation language (DML))) is used
to do things like inserting and deleting records. DDL and DML semantics are similar to
SQL's. You can refer to https://cwiki.apache.org/confluence/dis-
play/Hive/GettingStarted#GettingStarted-DDLOperations for
more information about DDL. Hive's data partitioning and external table support gives
users an added advantage to declare and analyze data over external file systems using
Hive. We will cover this in a later part of this chapter.
Sqoop means SQL to Hadoop. Solutions built over RDBMS are not scalable and
the user is looking forward to migrate on big data powered solutions. The first priority
is migrating existing production data to another database or file system. This is where
Apache Sqoop comes in very handy and can help to easily migrate data from one data-
base to another.
Now, let's explore each one of these tools in detail. We'll start with Apache Pig.
Apache Pig
Apache Pig is a platform that provides a simple scripting language known as Pig Latin
to build the MapReduce program in an abstract way. Initially it was developed as part
of Yahoo's research-related work but later moved to Apache incubation in 2007. It is
named as Pig as it can ingest/read in almost any format.
Search WWH ::




Custom Search