Databases Reference
In-Depth Information
Matrix
M
Vector
v
Figure 2.4: Division of a matrix and vector into fives stripes
We shall take up matrix-vector multiplication using map-reduce again in
Section 5.2. There, because of the particular application (PageRank calcula-
tion), we have an additional constraint that the result vector should be par-
titioned in the same way as the input vector, so the output may become the
input for another iteration of the matrix-vector multiplication. We shall see
there that the best strategy involves partitioning the matrix M into square
blocks, rather than stripes.
2.3.3
Relational-Algebra Operations
There are a number of operations on large-scale data that are used in database
queries. In many traditional database applications, these queries involve re-
trieval of small amounts of data, even though the database itself may be large.
For example, a query may ask for the bank balance of one particular account.
Such queries are not useful applications of map-reduce.
However, there are many operations on data that can be described easily in
terms of the common database-query primitives, even if the queries themselves
are not executed within a database management system. Thus, a good start-
ing point for seeing applications of map-reduce is by considering the standard
operations on relations. We assume you are familiar with database systems,
the query language SQL, and the relational model, but to review, a relation is
a table with column headers called attributes. Rows of the relation are called
tuples. The set of attributes of a relation is called its schema. We often write
an expression like R(A
1
, A
2
, . . . , A
n
) to say that the relation name is R and its
attributes are A
1
, A
2
, . . . , A
n
.
Example 2.3 : In Fig. 2.5 we see part of the relation Links that describes the
structure of the Web. There are two attributes, From and To. A row, or tuple,
of the relation is a pair of URL's, such that there is at least one link from
the first URL to the second. For instance, the first row of Fig. 2.5 is the pair