Database Reference
In-Depth Information
itself, but we first need to imagine that it is two relations, with different schemas, so we
can describe the desired connection as a natural join. Thus, imagine that there are two cop-
ies of Links , namely L 1( U 1 , U 2) and L 2( U 2 , U 3). Now, if we compute L 1 L 2, we shall
have exactly what we want. That is, for each tuple t 1 of L 1 (i.e., each tuple of Links ) and
each tuple t 2 of L 2 (another tuple of Links , possibly even the same tuple), see if their U 2
components are the same. Note that these components are the second component of t 1 and
the first component of t 2. If these two components agree, then produce a tuple for the res-
ult, with schema ( U 1 , U 2 , U 3). This tuple consists of the first component of t 1, the second
component of t 1 (which must equal the first component of t 2), and the second component
of t 2.
We may not want the entire path of length two, but only want the pairs ( u, w ) of URL's
such that there is at least one path from u to w of length two. If so, we can project out the
middle components by computing π U 1 , U 3 ( L 1 L 2).
EXAMPLE 2.5 Imagine that a social-networking site has a relation
Friends(User, Friend)
This relation has tuples that are pairs ( a, b ) such that b is a friend of a . The site might want
to develop statistics about the number of friends members have. Their first step would be
to compute a count of the number of friends of each user. This operation can be done by
grouping and aggregation, specifically
γ U ser , COUNT ( F riend) (Friends)
This operation groups all the tuples by the value in their first component, so there is one
group for each user. Then, for each group the count of the number of friends of that user
is made. The result will be one tuple for each group, and a typical tuple would look like
(Sally , 300), if user “Sally” has 300 friends.
2.3.4
Computing Selections by MapReduce
Selections really do not need the full power of MapReduce. They can be done most con-
veniently in the map portion alone, although they could also be done in the reduce portion
alone. Here is a MapReduce implementation of selection σ C ( R ).
x For each tuple t in R , test if it satisfies C . If so, produce the key-value pair ( t, t ). That
is, both the key and value are t .
The Reduce Function The Reduce function is the identity. It simply passes each key-value
pair to the output.
Search WWH ::




Custom Search