Database Reference
In-Depth Information
How are schemas propagated to new relations? Some relational operators don't change the
schema, so the relation produced by the LIMIT operator (which restricts a relation to a
maximum number of tuples), for example, has the same schema as the relation it operates
on. For other operators, the situation is more complicated. UNION , for example, combines
two or more relations into one and tries to merge the input relations' schemas. If the
schemas are incompatible, due to different types or number of fields, then the schema of
the result of the UNION is unknown.
You can find out the schema for any relation in the data flow using the DESCRIBE oper-
ator. If you want to redefine the schema for a relation, you can use the
FOREACH...GENERATE operator with AS clauses to define the schema for some or all
of the fields of the input relation.
See User-Defined Functions for a further discussion of schemas.
Functions
Functions in Pig come in four types:
Eval function
A function that takes one or more expressions and returns another expression. An ex-
ample of a built-in eval function is MAX , which returns the maximum value of the
entries in a bag. Some eval functions are aggregate functions , which means they oper-
ate on a bag of data to produce a scalar value; MAX is an example of an aggregate func-
tion. Furthermore, many aggregate functions are algebraic , which means that the result
of the function may be calculated incrementally. In MapReduce terms, algebraic func-
tions make use of the combiner and are much more efficient to calculate (see Combiner
Functions ). MAX is an algebraic function, whereas a function to calculate the median of
a collection of values is an example of a function that is not algebraic.
Filter function
A special type of eval function that returns a logical Boolean result. As the name sug-
gests, filter functions are used in the FILTER operator to remove unwanted rows.
They can also be used in other relational operators that take Boolean conditions, and in
general, in expressions using Boolean or conditional expressions. An example of a
built-in filter function is IsEmpty , which tests whether a bag or a map contains any
items.
Load function
A function that specifies how to load data into a relation from external storage.
Search WWH ::




Custom Search