Data Migration and Analytics - Beginning Apache Cassandra Development

Database Reference

In-Depth Information

come in very handy for such needs. Before we explore each of these tools, let's go over

a brief introduction to each of these.

Apache Pig allows end programmers to write MapReduce implementations in the

form of scripts. Apache Pig simply translates this Pig script into Hadoop-compatible

MapReduce implementations. There are functions and data type support available with

Apache Pig that provide easy and reusable integration to quickly write Pig-powered

MapReduce implementations. People building data pipeline or ETL-type solutions

prefer to use Pig, as it is procedural but not declarative. Since it is not declarative, you

can create checkpoints and plug in custom code at any point of the workflow.

Apache Hive enables users to manage and analyze large data sets using SQL-like

query language. SQL has been popular and widely used across the industry. It enables

programmers to quickly adopt Hadoop and HBase big data platforms by providing a

query-like interface, namely Hive Query Language (Hive QL). Generally it is used for

ad-hoc SQL-based analytics. With Hive QL we can perform various DDL and DML

operations in an SQL manner. Data definition language (DDL) is used for performing

tasks like creating and altering tables, and data manipulation language (DML))) is used

to do things like inserting and deleting records. DDL and DML semantics are similar to

SQL's. You can refer to https://cwiki.apache.org/confluence/dis-

more information about DDL. Hive's data partitioning and external table support gives

users an added advantage to declare and analyze data over external file systems using

Hive. We will cover this in a later part of this chapter.

Sqoop means SQL to Hadoop. Solutions built over RDBMS are not scalable and

the user is looking forward to migrate on big data powered solutions. The first priority

is migrating existing production data to another database or file system. This is where

Apache Sqoop comes in very handy and can help to easily migrate data from one data-

base to another.

Now, let's explore each one of these tools in detail. We'll start with Apache Pig.

Apache Pig

Apache Pig is a platform that provides a simple scripting language known as Pig Latin

to build the MapReduce program in an abstract way. Initially it was developed as part

of Yahoo's research-related work but later moved to Apache incubation in 2007. It is

named as Pig as it can ingest/read in almost any format.

Search WWH ::

Custom Search

Home