Database Reference
In-Depth Information
there are a few restrictions that can trip up the uninitiated. For example, it's not possible to
create a relation from a bag literal. So, the following statement fails:
A = {(1,2),(3,4)}; -- Error
The simplest workaround in this case is to load the data from a file using the LOAD state-
ment.
As another example, you can't treat a relation like a bag and project a field into a new re-
lation ( $0 refers to the first field of A , using the positional notation):
B = A.$0;
Instead, you have to use a relational operator to turn the relation A into relation B :
B = FOREACH A GENERATE $0;
It's possible that a future version of Pig Latin will remove these inconsistencies and treat
relations and bags in the same way.
Schemas
A relation in Pig may have an associated schema, which gives the fields in the relation
names and types. We've seen how an AS clause in a LOAD statement is used to attach a
schema to a relation:
grunt> records = LOAD 'input/ncdc/micro-tab/sample.txt'
>> AS (year:int, temperature:int, quality:int);
grunt> DESCRIBE records;
records: {year: int,temperature: int,quality: int}
This time we've declared the year to be an integer rather than a chararray , even
though the file it is being loaded from is the same. An integer may be more appropriate if
we need to manipulate the year arithmetically (to turn it into a timestamp, for example),
whereas the chararray representation might be more appropriate when it's being used
as a simple identifier. Pig's flexibility in the degree to which schemas are declared con-
trasts with schemas in traditional SQL databases, which are declared before the data is
loaded into the system. Pig is designed for analyzing plain input files with no associated
type information, so it is quite natural to choose types for fields later than you would with
an RDBMS.
It's possible to omit type declarations completely, too:
grunt> records = LOAD 'input/ncdc/micro-tab/sample.txt'
>>
AS (year, temperature, quality);
Search WWH ::




Custom Search