Database Reference
In-Depth Information
tion. We shall see there that the best strategy involves partitioning the matrix
M
into square
blocks, rather than stripes.
2.3.3
Relational-Algebra Operations
There are a number of operations on large-scale data that are used in database queries.
Many traditional database applications involve retrieval of small amounts of data, even
though the database itself may be large. For example, a query may ask for the bank balance
of one particular account. Such queries are not useful applications of MapReduce.
However, there are many operations on data that can be described easily in terms of the
common database-query primitives, even if the queries themselves are not executed with-
in a database management system. Thus, a good starting point for exploring applications
of MapReduce is by considering the standard operations on relations. We assume you are
familiar with database systems, the query language SQL, and the relational model, but to
review, a
relation
is a table with column headers called
attributes
. Rows of the relation are
called
tuples
. The set of attributes of a relation is called its
schema
. We often write an ex-
pression like
R
(
A
1
, A
2
, . . . , A
n
) to say that the relation name is
R
and its attributes are
A
1
,
A
2
, . . . , A
n
.
EXAMPLE
2.3 In
Fig. 2.5
we see part of the relation
Links
that describes the structure of the
Web. There are two attributes,
From
and
To
. A row, or tuple, of the relation is a pair of
URL's, such that there is at least one link from the first URL to the second. For instance,
the first row of
Fig. 2.5
is the pair (
url
1
, url
2) that says the Web page
url
1 has a link to page
url
2. While we have shown only four tuples, the real relation of the Web, or the portion of
it that would be stored by a typical search engine, has billions of tuples.
□
Figure 2.5
Relation
Links
consists of the set of pairs of URL's, such that the first has one or more links to the second
A relation, however large, can be stored as a file in a distributed file system. The ele-
ments of this file are the tuples of the relation.
There are several standard operations on relations, often referred to as
relational al-
gebra
, that are used to implement queries. The queries themselves usually are written in
SQL. The relational-algebra operations we shall discuss are: