Database Reference
In-Depth Information
FlightData = LOAD 'FlightPerformance.csv' using
PigStorage(',');
Another built-in load function is
JsonLoader
. This is used to load JSON
(JavaScript Object Notation)-formatted files. JSON is a text-based open
standard designed for human-readable data interchange and is often used
to transmit structured data over network connections. The following code
loads a JSON-formatted file:
FlightData = LOAD ' FlightPerformance.json' using
JsonLoader();
NOTE
For more information on JSON, see
http://www.json.org/
.
You can also store data using the storage functions. For example, the
following code stores data into a tab-delimited text file (the default format):
STORE FlightData into 'FlightDataProcessed' using
PigStorage();
Functions used to evaluate and aggregate data include
IsEmpty
,
Size
,
Count
,
Sum
,
Avg
, and
Concat
, to name a few. The following code filters out
tuples that have an empty airport code:
FlightDataFiltered = Filter FlightData By
IsEmpty(AirportCode);
Common math functions include
ABS
,
CEIL
,
SIN
, and
TAN
. The following
code uses the
CEIL
function to round the delay times up to the nearest
minute (integer):
FlightDataCeil = FOREACH FlightData
GENERATE CEIL(FlightDelay) AS FlightDelay2;