Databases Reference
In-Depth Information
Even a couple of years back, a terabyte of personal data may have seemed quite large. However, now
local hard drives and backup drives are commonly available at this size. In the next couple of years,
it wouldn't be surprising if your default hard drive were over a few terabytes in capacity. We are
living in an age of rampant data growth. Our digital camera outputs, blogs, daily social networking
updates, tweets, electronic documents, scanned content, music fi les, and videos are growing at a
rapid pace. We are consuming a lot of data and producing it too.
It's diffi cult to assess the true size of digitized data or the size of the Internet but a few studies,
estimates, and data points reveal that it's immensely large and in the range of a zettabyte and more.
In an ongoing study titled, “The Digital Universe Decade - Are you ready?” ( http://emc.com/
collateral/demos/microsites/idc-digital-universe/iview.htm ), IDC, on behalf of EMC,
presents a view into the current state of digital data and its growth. The report claims that the total
size of digital data created and replicated will grow to 35 zettabytes by 2020. The report also claims
that the amount of data produced and available now is outgrowing the amount of available storage.
A few other data points worth considering are as follows:
A 2009 paper in ACM titled, “MapReduce: simplifi ed data processing on large
clusters” — http://portal.acm.org/citation.cfm?id=1327452.1327492&coll=GU
IDE&dl=&idx=J79&part=magazine&WantType=Magazines&title=Communications%
20of%20the%20ACM — revealed that Google processes 24 petabytes of data per day.
A 2009 post from Facebook about its photo storage system, “Needle in a haystack: effi cient
storage of billions of photos” — http//facebook.com/note.php?note_id=76191543919
mentioned the total size of photos in Facebook to be 1.5 pedabytes. The same post mentioned
that around 60 billion images were stored on Facebook.
The Internet archive FAQs at archive.org/about/faqs.php say that 2 petabytes of data
are stored in the Internet archive. It also says that the data is growing at the rate of 20
terabytes per month.
The movie Avatar took up 1 petabyte of storage space for the rendering of 3D CGI effects.
(“Believe it or not: Avatar takes 1 petabyte of storage space, equivalent to a 32-year-long
MP3” — http://thenextweb.com/2010/01/01/avatar-takes-1-petabyte-storage-
space-equivalent-32-year-long-mp3/ .)
As the size of data grows and sources of data creation become increasingly diverse, the following
growing challenges will get further amplifi ed:
Effi ciently storing and accessing large amounts of data is diffi cult. The additional demands
of fault tolerance and backups makes things even more complicated.
Manipulating large data sets involves running immensely parallel processes. Gracefully
recovering from any failures during such a run and providing results in a reasonably short
period of time is complex.
Managing the continuously evolving schema and metadata for semi-structured and
un-structured data, generated by diverse sources, is a convoluted problem.
Therefore, the ways and means of storing and retrieving large amounts of data need newer
approaches beyond our current methods. NoSQL and related big-data solutions are a fi rst step
forward in that direction.
Search WWH ::




Custom Search