Database Reference
In-Depth Information
NOTE
Pig scripts are generally saved with a .PIG extension. This is a
convention, but it is not required. However, it does make it easier for
people to find and use your scripts.
Another key aspect of Pig Latin is that the statements are declarative rather
than imperative. That is, they tell Pig what you intend to do, but the Pig
engine can determine the best way accomplish the operation. It may
rearrange or combine certain operations to produce a more efficient plan
for accomplishing the work. This is similar to the way SQL Server's query
optimizer may rewrite your SQL queries to get the results in the fastest way
possible.
Several functions facilitate debugging Pig Latin. One useful one is DUMP .
This will output the contents of the specified relation to the screen. If the
relation contains a large amount of data, though, this can be
time-prohibitive to execute:
DUMP source;
DESCRIBE outputs the schema of a relation to a console. This can help you
understand what the relation looks like after various transformations have
been applied:
DESCRIBE grouped;
EXPLAIN shows the planned execution model for producing the specified
relation. This outputs the logical, physical, and MapReduce plans to the
console:
EXPLAIN filtered;
ILLUSTRATE shows the data steps that produce a given relation. This is
different from the plan, in that it actually displays the data in each relation
at each step:
ILLUSTRATE grouped;
Search WWH ::




Custom Search