Database Reference
In-Depth Information
Table 16-1. Pig Latin relational operators
Category
Operator
Description
Loading and stor-
ing
Loads data from the filesystem or other storage into a rela-
tion
LOAD
Saves a relation to the filesystem or other storage
STORE
DUMP
(
\d
)
Prints a relation to the console
Filtering
Removes unwanted rows from a relation
FILTER
Removes duplicate rows from a relation
DISTINCT
FOREACH...GENERATE
Adds or removes fields to or from a relation
Runs a MapReduce job using a relation as input
MAPREDUCE
Transforms a relation using an external program
STREAM
Selects a random sample of a relation
SAMPLE
Ensures a condition is true for all rows in a relation; other-
wise, fails
ASSERT
Grouping and
joining
Joins two or more relations
JOIN
Groups the data in two or more relations
COGROUP
Groups the data in a single relation
GROUP
Creates the cross product of two or more relations
CROSS
Creates aggregations for all combinations of specified
columns in a relation
CUBE
Sorting
Sorts a relation by one or more fields
ORDER
Assign a rank to each tuple in a relation, optionally sorting
by fields first
RANK
Limits the size of a relation to a maximum number of tuples
LIMIT
Combining and
splitting
Combines two or more relations into one
UNION
Splits a relation into two or more relations
SPLIT
There are other types of statements that are not added to the logical plan. For example, the
diagnostic operators —
DESCRIBE
,
EXPLAIN
, and
ILLUSTRATE
— are provided to al-
low the user to interact with the logical plan for debugging purposes (see
Table 16-2
).
DUMP
is a sort of diagnostic operator, too, since it is used only to allow interactive debug-
ging of small result sets or in combination with
LIMIT
to retrieve a few rows from a lar-