Database Reference
In-Depth Information
Data
Type
Description
{(Vivek,Apress,6),(Chris,Apress,Reviewer,6),{Melissa,Apress,Coordinator,6),{Brian,Apress,Tech
reviewer)}
Map
A map is a collection of key value pairs. Each key value pair is delimited by #.
For example,
[name#vivek,email#vivek.mishra@nomail.com]
is a map containing two key value pairs. The first element has
name
as key and
vivek
as value, whereas the second one has
email
as key and
vivek.mishra@nomail.com
as value.
Pig Functions
Pig comes with several built-in functions. Users can also implement custom user-
defined functions (UDFs). Built-in pig functions can further be categorized as
•
Eval functions
•
Math functions
•
String functions
•
Store functions
•
Bag/tuple functions
In this section, let's look at some of the most commonly used Pig functions.
PigStorage
The default function to load data in UTP-8 format. Examples of
PigStorage
are in
the next two sections.
LOAD
The
LOAD
function loads data from the file system. By default it is used in conjunction
with the
PigStorage
function as shown in the following steps:
1.
Create a sample
tweets.txt
on the local file system in a folder (in
this example it is
/home/vivek
). The file will contain data in the
following format: