Database Reference
In-Depth Information
2
Hosting and Sharing Terabytes
of Raw Data
Poor fellow, he suffers from files.
—Aneurin Bevan
T he two truths of the Internet are “no one knows you're a dog,” and it's easy to share
lots of data with the world—right?
Sharing large amounts of open data should be common practice for governments
and research organizations. Data can help inform intelligent policy making as well as
provide innovative kindling for investigative journalism, but it's not really easy to find
public and municipal datasets. In fact, municipalities that provide loads of publicly
available data are often celebrated in the media as innovative pioneers rather than com-
petent governments just doing their jobs. Even when data is freely available, it can be
shared using data formats that are nearly impossible for people and computer programs
alike to consume in a meaningful way. The sharing of public data online—a task that
should seemingly be simple and even taken for granted—is currently the exception
rather than the norm. In 2011, the famous Web comic XKCD even published a telling
panel that described the process of sending [large] files across the Internet as “some-
thing early adopters are still figuring out how to do.” 1
Despite all of these problems, storing and sharing large amounts of data as thou-
sands or even millions of separate documents is no longer a technological or economic
impossibility.
This chapter will explore the technical challenges faced as part of the seemingly
simple task of sharing a large collection of documents for public consumption—and
what technologies are available to overcome these challenges. Our goal is to under-
stand how to make great choices when faced with similar situations and which tools
can help.
1. http://xkcd.com/949/
 
 
 
Search WWH ::




Custom Search