Database Reference
In-Depth Information
open source community to develop Pig into a production-quality product.
The open source community embraced the project, and additional features
were added such as error handling, a streaming operator, parameter
substitution, and binary comparators. Eventually, the entire codebase was
rewritten to provide significant performance increases. With the growing
popularity of Hadoop and Pig, the open source community continues to
improve and augment Pig.
Today, Pig is a high-level scripting program used to create map-reduce
functions that are translated to Java and run on a Hadoop cluster. The
scripting language for Pig is called Pig Latin and is written to provide an
easier way to write data-processing instructions. In the following sections,
you'll learn more about Pig, including when to use Pig and how to use both
built-in and user-defined functions.
The Difference Between Pig and Hive
The main difference between Pig and Hive is that Pig Latin, Pig's
scripting language, is a procedural language, whereas HiveQL is a
declarative language. This means that when using Pig Latin you have
more control over how the data is processed through the pipeline, and
the processing consists of a series of steps, in between which the data
can be checked and stored. With HiveQL, you construct and run the
statement as a whole, submitting it to a query engine to optimize and
run the code. You have very little influence on the steps performed to
achieve the result. Instead, you have faith that the query engine will
choose the most efficient steps needed. If you have a programming
background, you are probably more comfortable with and like the
control you get using Pig Latin. However, if you a lot of experience with
writing database queries, you will most likely feel more comfortable
with HiveQL.
When to Use Pig
It is important that you use the right tool for the job. Although we have all
usedthesideofawrenchtohammerinanail,ahammerworksmuchbetter!
Pig is designed and tuned to process large data sets involving a number of
steps. As such, it is primarily an extraction transform load (ETL) tool. In
Search WWH ::




Custom Search