Database Reference
In-Depth Information
Google's Protocol Buffers are very similar to Thrift, and the software to compile
Protocol Buffers is also available to use as open-source software.
Like Thrift, Protocol Buffers are used in a variety of projects. While Protocol Buf-
fers are generally used for moving data around, some organizations are using Protocol
Buffers for persistent files. For example, OpenStreetMap is currently converting its
available map data from XML to a static format known as PBF (protocol buffer binary
format).
Apache Avro
Apache Avro is a relatively new data serialization format that combines some of the
best features of all the technologies we have described previously. Unlike Apache
Thrift or Protocol Buffers, Avro data is self-describing and uses a JSON-schema
description to describe each individual field. However, unlike XML or JSON, this
schema is not incorporated into every field but instead is provided as a JSON object
included with the data itself. Avro also natively supports compression so that develop-
ers don't have to spend much time worrying about building this logic themselves.
Avro is currently used in the Hadoop community, but it deserves to be more widely
used in other applications. Some developers are exploring the possibility of reading and
writing data in the Avro format as static files. Programming language support for Avro
has been growing, and over time it's possible that it may become more of a de facto
standard for data serialization.
Summary
Sharing a large set of documents seems like a simple task, but the reality is that many
organizations that should be sharing data in the most accessible ways possible simply
aren't.
Are you hosting a massive collection of files? In order to maintain scalability and
affordability in the face of massive amounts of storage and bandwidth, distributed stor-
age services provided by infrastructure-as-a-service providers are generally the only
economical solution.
Choosing a file format for your hosted files can be a challenge, but not if you know
how your audience may use your data. CSV files are trivial to parse and are univer-
sally supported by everything from spreadsheets to database software to programming
languages. However, CSV is not able to model complex data structures easily. XML is
a great format for providing an unambiguous source of data that can be used for con-
verting documents from one format to another. It is well supported and is best used for
providing unambiguous structure for document data. However, it is not necessarily the
best format for software developers who wish to quickly serialize, transfer, and parse
data. Although JSON is not as extensible as XML or as simple as CSV, its ease of use
for data transfer certainly makes it a popular format for programmers and nonrelational
database administrators.
Both XML and JSON formats carry a lot of baggage from the markup used to
describe data. As data needs scale, more large data source providers are moving from
 
 
Search WWH ::




Custom Search