Databases Reference
In-Depth Information
Pig Latin can be executed as statements in either in Local or MapReduce mode, either interac-
tively or as batch programs:
In Local mode, Pig runs in a single JVM and accesses the local file system. This mode is suitable
only for small data sets and can be run on minimal infrastructure.
In MapReduce mode, Pig translates programs (queries and statements) into MapReduce jobs and
runs them on a Hadoop cluster. Production environments for running Pig are deployed in this
mode.
Pig data types
Pig language supports the following data types:
Scalar types: int, long, double, chararray, bytearray
Complex types:
map: associative array
tuple: ordered list of data, elements may be of any scalar or complex type
bag: unordered collection of tuples
Running pig programs
Pig programs can be run in three modes, all of which work in both Local and MapReduce modes (for
more details see Apache Pig Wiki at ( http://pig.apache.org/ )).
Scripting driven—a Pig program can be run as a script file, processed from a command line.
Grunt shell—an interactive shell for running Pig commands.
Embedded—you can run Pig programs from Java, using Java DataBase Connectivity (JDBC)
drivers like a traditional SQL program from Java.
Pig program flow
Pig program control has many built-in commands and syntax. We will take a look at the core execu-
tion model. Every Pig module has LOAD, DUMP, and STORE statements:
A LOAD statement reads data from the file system.
A series of “transformation” statements process the data.
A STORE statement writes output to the file system.
A DUMP statement displays output to the screen.
Common pig command
LOAD: Read data from file system.
STORE: Write data to file system.
FOREACH: Apply expression to each record and output one or more records.
FILTER: Apply predicate and remove records that do not return true.
GROUP/COGROUP: Collect records with the same key from one or more inputs.
JOIN: Join two or more inputs based on a key.
ORDER: Sort records based on a key.
DISTINCT: Remove duplicate records.
UNION: Merge two data sets.
Search WWH ::




Custom Search