Database Reference
In-Depth Information
History of Apache Hadoop and its trends
We live in the era where almost everything surrounding us is generating some kind of data.
A click on a web page is being logged on the server. The flipping of channels when watch-
ing TV is being captured by cable companies. A search on a search engine is being logged.
A heartbeat of a patient in a hospital generates data. A single phone call generates data,
which is stored and maintained by telecom companies. An order of pizza generates data. It
is very difficult to find processes these days that don't generate and store data.
Why would any organization want to store data? The present and the future belongs to
those who hold onto their data and work with it to improve their current operations and in-
novate to generate newer products and opportunities. Data and the creative use of it is the
heart of organizations such as Google, Facebook, Netflix, Amazon, and Yahoo!. They have
proven that data, along with powerful analysis, helps in building fantastic and powerful
products.
Organizations have been storing data for several years now. However, the data remained on
backup tapes or drives. Once it has been archived on storage devices such as tapes, it can
only be used in case of emergency to retrieve important data. However, processing or ana-
lyzing this data to get insight efficiently is very difficult. This is changing. Organizations
want to now use this data to get insight to help understand existing problems, seize new op-
portunities, and be more profitable. The study and analysis of these vast volumes of data
has given birth to a term called big data . It is a phrase often used to promote the import-
ance of the ever-growing data and the technologies applied to analyze this data.
Big and small companies now understand the importance of data and are adding loggers to
their operations with an intention to generate more data every day. This has given rise to a
very important problem—storage and efficient retrieval of data for analysis. With the data
growing at such a rapid rate, traditional tools for storage and analysis fall short. Though
these days the cost per byte has reduced considerably and the ability to store more data has
increased, the disk transfer rate has remained the same. This has been a bottleneck for pro-
cessing large volumes of data. Data in many organizations have reached petabytes and is
continuing to grow. Several companies have been working to solve this problem and have
come out with a few commercial offerings that leverage the power of distributed comput-
ing. In this solution, multiple computers work together (a cluster ) to store and process
large volumes of data in parallel, thus making the analysis of large volumes of data pos-
sible. Google, the Internet search engine giant, ran into issues when their data, acquired by
crawling the Web, started growing to such large volumes that it was getting increasingly
Search WWH ::




Custom Search