Database Reference
In-Depth Information
optimizer hints. In general, automatic query optimization has its limits especially
with uncatalogued data, prevalent user-defined functions, and parallel execution,
which are all features of the data analysis tasks targeted by the MapReduce frame-
work. Figure 2.11 shows an example SQL query and its equivalent Pig Latin pro-
gram. Given a URL table with the structure ( url , category , pagerank ), the task of the
SQL query is to find each large category and its average pagerank of high-pagerank
URLs (> 0.2). A Pig Latin program is described as a sequence of steps where each
step represents a single data transformation. This characteristic is appealing to many
programmers. At the same time, the transformation steps are described using high-
level primitives (e.g., filtering, grouping, aggregation) much like in SQL.
Pig Latin has several other features that are important for casual ad hoc data
analysis tasks. These features include support for a flexible, fully nested data model,
extensive support for user-defined functions and the ability to operate over plain
input files without any schema information [56]. In particular, Pig Latin has a simple
data model consisting of the following four types:
1. Atom : An atom contains a simple atomic value such as a string or a number,
for example, “alice.”
2. Tuple : A tuple is a sequence of fields, each of which can be any of the data
types, for example, (“alice,” “lakers”).
3. Bag : A bag is a collection of tuples with possible duplicates. The schema of
the constituent tuples is flexible where not all tuples in a bag need to have
the same number and type of fields
alicelakers
aliceiPod
(
,
” “
)
forexample,
(
,
” “
(
, ”“
apple))
4. Map : A map is a collection of data items, where each item has an associated key
through which it can be looked up. As with bags, the schema of the constituent
data items is flexible. However, the keys are required to be data atoms,
“”
k1
(
alicelakers
, ”“
)
forexample
,
“” “”
k2
20
FIGURE 2.11 An example SQL query and its equivalent Pig Latin program. (From A. Gates
et al., PVLDB , 2(2), 1414 -1425, 20 09.)
Search WWH ::




Custom Search