Database Reference
In-Depth Information
enum VenueType {
COFFEESHOP = 0 ;
WORKPLACE = 1 ;
CLUB = 2 ;
OMNOMNOM = 3 ;
OTHER = 4 ;
}
}
message VenueResponse {
repeated Venue results = 1 ;
}
Twitter's Elephant Bird library, which we used in the previous section to load JSON
data, also supports loading and saving data from protocol buffers. Let's look at writ‐
ing out some Venues in Example 5-28 .
Example 5-28. Elephant Bird protocol buffer writeout in Scala
val job = new Job ()
val conf = job . getConfiguration
LzoProtobufBlockOutputFormat . setClassConf ( classOf [ Places.Venue ], conf );
val dnaLounge = Places . Venue . newBuilder ()
dnaLounge . setId ( 1 );
dnaLounge . setName ( "DNA Lounge" )
dnaLounge . setType ( Places . Venue . VenueType . CLUB )
val data = sc . parallelize ( List ( dnaLounge . build ()))
val outputData = data . map { pb =>
val protoWritable = ProtobufWritable . newInstance ( classOf [ Places.Venue ]);
protoWritable . set ( pb )
( null , protoWritable )
}
outputData . saveAsNewAPIHadoopFile ( outputFile , classOf [ Text ],
classOf [ ProtobufWritable [ Places.Venue ]],
classOf [ LzoProtobufBlockOutputFormat [ ProtobufWritable [ Places.Venue ]]], conf )
A full version of this example is available in the source code for this topic.
When building your project, make sure to use the same protocol
buffer library version as Spark. As of this writing, that is version
2.5.
File Compression
Frequently when working with Big Data, we find ourselves needing to use com‐
pressed data to save storage space and network overhead. With most Hadoop output
Search WWH ::




Custom Search