Getting Started with Apache Hadoop - Cloudera Administration

Database Reference

In-Depth Information

History of Apache Hadoop and its trends

We live in the era where almost everything surrounding us is generating some kind of data.

A click on a web page is being logged on the server. The flipping of channels when watch-

ing TV is being captured by cable companies. A search on a search engine is being logged.

A heartbeat of a patient in a hospital generates data. A single phone call generates data,

which is stored and maintained by telecom companies. An order of pizza generates data. It

is very difficult to find processes these days that don't generate and store data.

Why would any organization want to store data? The present and the future belongs to

those who hold onto their data and work with it to improve their current operations and in-

novate to generate newer products and opportunities. Data and the creative use of it is the

heart of organizations such as Google, Facebook, Netflix, Amazon, and Yahoo!. They have

proven that data, along with powerful analysis, helps in building fantastic and powerful

products.

Organizations have been storing data for several years now. However, the data remained on

backup tapes or drives. Once it has been archived on storage devices such as tapes, it can

only be used in case of emergency to retrieve important data. However, processing or ana-

lyzing this data to get insight efficiently is very difficult. This is changing. Organizations

want to now use this data to get insight to help understand existing problems, seize new op-

portunities, and be more profitable. The study and analysis of these vast volumes of data

has given birth to a term called big data . It is a phrase often used to promote the import-

ance of the ever-growing data and the technologies applied to analyze this data.

Big and small companies now understand the importance of data and are adding loggers to

their operations with an intention to generate more data every day. This has given rise to a

very important problem—storage and efficient retrieval of data for analysis. With the data

growing at such a rapid rate, traditional tools for storage and analysis fall short. Though

these days the cost per byte has reduced considerably and the ability to store more data has

increased, the disk transfer rate has remained the same. This has been a bottleneck for pro-

cessing large volumes of data. Data in many organizations have reached petabytes and is

continuing to grow. Several companies have been working to solve this problem and have

come out with a few commercial offerings that leverage the power of distributed comput-

ing. In this solution, multiple computers work together (a cluster ) to store and process

large volumes of data in parallel, thus making the analysis of large volumes of data pos-

sible. Google, the Internet search engine giant, ran into issues when their data, acquired by

crawling the Web, started growing to such large volumes that it was getting increasingly

Search WWH ::

Custom Search

Home