Database Reference
In-Depth Information
In addition to grouping data from a single RDD, we can group data sharing the same
key from multiple RDDs using a function called cogroup() . cogroup() over two
RDDs sharing the same key type, K , with the respective value types V and W gives us
back RDD[(K, (Iterable[V], Iterable[W]))] . If one of the RDDs doesn't have ele‐
ments for a given key that is present in the other RDD, the corresponding Iterable
is simply empty. cogroup() gives us the power to group data from multiple RDDs.
cogroup() is used as a building block for the joins we discuss in the next section.
cogroup() can be used for much more than just implementing
joins. We can also use it to implement intersect by key. Addition‐
ally, cogroup() can work on three or more RDDs at once.
Joins
Some of the most useful operations we get with keyed data comes from using it
together with other keyed data. Joining data together is probably one of the most
common operations on a pair RDD, and we have a full range of options including
right and left outer joins, cross joins, and inner joins.
The simple join operator is an inner join. 1 Only keys that are present in both pair
RDDs are output. When there are multiple values for the same key in one of the
inputs, the resulting pair RDD will have an entry for every possible pair of values
with that key from the two input RDDs. A simple way to understand this is by look‐
ing at Example 4-17 .
Example 4-17. Scala shell inner join
storeAddress = {
( Store ( "Ritual" ), "1026 Valencia St" ), ( Store ( "Philz" ), "748 Van Ness Ave" ),
( Store ( "Philz" ), "3101 24th St" ), ( Store ( "Starbucks" ), "Seattle" )}
storeRating = {
( Store ( "Ritual" ), 4.9 ), ( Store ( "Philz" ), 4.8 ))}
storeAddress . join ( storeRating ) == {
( Store ( "Ritual" ), ( "1026 Valencia St" , 4.9 )),
( Store ( "Philz" ), ( "748 Van Ness Ave" , 4.8 )),
( Store ( "Philz" ), ( "3101 24th St" , 4.8 ))}
1 “Join” is a database term for combining fields from two tables using common values.
 
Search WWH ::




Custom Search