Database Reference
In-Depth Information
Table 16-1. Pig Latin relational operators
Category
Operator
Description
Loading and stor-
ing
Loads data from the filesystem or other storage into a rela-
tion
LOAD
Saves a relation to the filesystem or other storage
STORE
DUMP ( \d )
Prints a relation to the console
Filtering
Removes unwanted rows from a relation
FILTER
Removes duplicate rows from a relation
DISTINCT
FOREACH...GENERATE Adds or removes fields to or from a relation
Runs a MapReduce job using a relation as input
MAPREDUCE
Transforms a relation using an external program
STREAM
Selects a random sample of a relation
SAMPLE
Ensures a condition is true for all rows in a relation; other-
wise, fails
ASSERT
Grouping and
joining
Joins two or more relations
JOIN
Groups the data in two or more relations
COGROUP
Groups the data in a single relation
GROUP
Creates the cross product of two or more relations
CROSS
Creates aggregations for all combinations of specified
columns in a relation
CUBE
Sorting
Sorts a relation by one or more fields
ORDER
Assign a rank to each tuple in a relation, optionally sorting
by fields first
RANK
Limits the size of a relation to a maximum number of tuples
LIMIT
Combining and
splitting
Combines two or more relations into one
UNION
Splits a relation into two or more relations
SPLIT
There are other types of statements that are not added to the logical plan. For example, the
diagnostic operators — DESCRIBE , EXPLAIN , and ILLUSTRATE — are provided to al-
low the user to interact with the logical plan for debugging purposes (see Table 16-2 ).
DUMP is a sort of diagnostic operator, too, since it is used only to allow interactive debug-
ging of small result sets or in combination with LIMIT to retrieve a few rows from a lar-
Search WWH ::




Custom Search