Database Reference
In-Depth Information
When you need to operate on columns, you can use the FOREACH function.
It is used when working with data like that shown here, because it runs the
associated function for each value in the specified column. If you want to
produce an average totalpurchaseamount for each city, you can use the
following statement:
averaged = FOREACH grouped GENERATE group,
AVG(filtered.totalpurchaseamount);
To order the results, you can use the ORDER function. In this case, the $2
indicates that the statement is using the ordinal column position, rather
than addressing it by name:
ordered = ORDER averaged BY $2 DESC;
To store the results, you can call the STORE function. This lets you write the
values back to Hadoop using the PigStorage() functionality:
STORE ordered INTO 'c:\SampleData\PigOutput.txt' USING
PigStorage();
If you take this entire set of statements together, you can see that Pig Latin
is relatively easy to read and understand. These statements could be saved
to a file as a Pig script and then executed as a batch file:
source = LOAD '/MsBigData/Customer/' USING PigStorage()
AS (name, city, state,
postalcode, totalpurchaseamount);
filtered = FILTER source BY state = 'FL';
grouped = GROUP filtered BY city;
averaged = FOREACH grouped GENERATE group,
AVG(filtered.totalpurchaseamount);
ordered = ORDER averaged BY $2 DESC;
STORE ordered INTO 'c:\SampleData\PigOutput.txt' USING
PigStorage();
Search WWH ::




Custom Search