Database Reference
In-Depth Information
tion. We shall see there that the best strategy involves partitioning the matrix M into square
blocks, rather than stripes.
2.3.3
Relational-Algebra Operations
There are a number of operations on large-scale data that are used in database queries.
Many traditional database applications involve retrieval of small amounts of data, even
though the database itself may be large. For example, a query may ask for the bank balance
of one particular account. Such queries are not useful applications of MapReduce.
However, there are many operations on data that can be described easily in terms of the
common database-query primitives, even if the queries themselves are not executed with-
in a database management system. Thus, a good starting point for exploring applications
of MapReduce is by considering the standard operations on relations. We assume you are
familiar with database systems, the query language SQL, and the relational model, but to
review, a relation is a table with column headers called attributes . Rows of the relation are
called tuples . The set of attributes of a relation is called its schema . We often write an ex-
pression like R ( A 1 , A 2 , . . . , A n ) to say that the relation name is R and its attributes are A 1 ,
A 2 , . . . , A n .
EXAMPLE 2.3 In Fig. 2.5 we see part of the relation Links that describes the structure of the
Web. There are two attributes, From and To . A row, or tuple, of the relation is a pair of
URL's, such that there is at least one link from the first URL to the second. For instance,
the first row of Fig. 2.5 is the pair ( url 1 , url 2) that says the Web page url 1 has a link to page
url 2. While we have shown only four tuples, the real relation of the Web, or the portion of
it that would be stored by a typical search engine, has billions of tuples.
Figure 2.5 Relation Links consists of the set of pairs of URL's, such that the first has one or more links to the second
A relation, however large, can be stored as a file in a distributed file system. The ele-
ments of this file are the tuples of the relation.
There are several standard operations on relations, often referred to as relational al-
gebra , that are used to implement queries. The queries themselves usually are written in
SQL. The relational-algebra operations we shall discuss are:
Search WWH ::




Custom Search