Database Reference
In-Depth Information
Class would be Text , and our valueClass would be IntWritable or VIntWritable ,
but for simplicity we'll work with IntWritable in Examples 5-20 through 5-22 .
Example 5-20. Loading a SequenceFile in Python
val data = sc . sequenceFile ( inFile ,
"org.apache.hadoop.io.Text" , "org.apache.hadoop.io.IntWritable" )
Example 5-21. Loading a SequenceFile in Scala
val data = sc . sequenceFile ( inFile , classOf [ Text ], classOf [ IntWritable ]).
map { case ( x , y ) => ( x . toString , y . get ())}
Example 5-22. Loading a SequenceFile in Java
public static class ConvertToNativeTypes implements
PairFunction < Tuple2 < Text , IntWritable >, String , Integer > {
public Tuple2 < String , Integer > call ( Tuple2 < Text , IntWritable > record ) {
return new Tuple2 ( record . _1 . toString (), record . _2 . get ());
}
}
JavaPairRDD < Text , IntWritable > input = sc . sequenceFile ( fileName , Text . class ,
IntWritable . class );
JavaPairRDD < String , Integer > result = input . mapToPair (
new ConvertToNativeTypes ());
In Scala there is a convenience function that can automatically
convert Writables to their corresponding Scala type. Instead of
specifying the keyClass and valueClass , we can call sequence
File[Key, Value](path, minPartitions) and get back an RDD
of native Scala types.
Saving SequenceFiles
Writing the data out to a SequenceFile is fairly similar in Scala. First, because Sequen‐
ceFiles are key/value pairs, we need a PairRDD with types that our SequenceFile can
write out. Implicit conversions between Scala types and Hadoop Writables exist for
many native types, so if you are writing out a native type you can just save your
PairRDD by calling saveAsSequenceFile(path) , and it will write out the data for you.
If there isn't an automatic conversion from our key and value to Writable, or we want
to use variable-length types (e.g., VIntWritable ), we can just map over the data and
convert it before saving. Let's consider writing out the data that we loaded in the pre‐
vious example (people and how many pandas they have seen), as shown in
Example 5-23 .
Search WWH ::




Custom Search