Database Reference
In-Depth Information
enum
VenueType
{
COFFEESHOP
=
0
;
WORKPLACE
=
1
;
CLUB
=
2
;
OMNOMNOM
=
3
;
OTHER
=
4
;
}
}
message
VenueResponse
{
repeated
Venue
results
=
1
;
}
Twitter's Elephant Bird library, which we used in the previous section to load JSON
data, also supports loading and saving data from protocol buffers. Let's look at writ‐
ing out some
Venues
in
Example 5-28
.
Example 5-28. Elephant Bird protocol buffer writeout in Scala
val
job
=
new
Job
()
val
conf
=
job
.
getConfiguration
LzoProtobufBlockOutputFormat
.
setClassConf
(
classOf
[
Places.Venue
],
conf
);
val
dnaLounge
=
Places
.
Venue
.
newBuilder
()
dnaLounge
.
setId
(
1
);
dnaLounge
.
setName
(
"DNA Lounge"
)
dnaLounge
.
setType
(
Places
.
Venue
.
VenueType
.
CLUB
)
val
data
=
sc
.
parallelize
(
List
(
dnaLounge
.
build
()))
val
outputData
=
data
.
map
{
pb
=>
val
protoWritable
=
ProtobufWritable
.
newInstance
(
classOf
[
Places.Venue
]);
protoWritable
.
set
(
pb
)
(
null
,
protoWritable
)
}
outputData
.
saveAsNewAPIHadoopFile
(
outputFile
,
classOf
[
Text
],
classOf
[
ProtobufWritable
[
Places.Venue
]],
classOf
[
LzoProtobufBlockOutputFormat
[
ProtobufWritable
[
Places.Venue
]]],
conf
)
A full version of this example is available in the source code for this topic.
When building your project, make sure to use the same protocol
buffer library version as Spark. As of this writing, that is version
2.5.
File Compression
Frequently when working with Big Data, we find ourselves needing to use com‐
pressed data to save storage space and network overhead. With most Hadoop output