Application Architectures for Big Data and Analytics - Big Data Imperatives

Databases Reference

In-Depth Information

partially fill the gap. It implements a column-oriented data store modeled on Google's

BigTable on top of Hadoop and HDFS, and it also provides indexing for HDFS. With

HBase it is possible to have multiple large tables or even just one large table distributed

beneath Hadoop.

There are a few areas where Hadoop, in its current form, scores well. An obvious one

is as an extract, transform, load (ETL) staging system when an organization has a flood of

data and only a small proportion can be put to use. The data can be stored in Hadoop and

jobs run to extract useful data to put into a database for deeper analysis.

Hadoop was built as a parallel processing environment for large data volumes,

not as a database. For that reason, it can be very useful if you need to manipulate data

in sophisticated ways. For example, it has been used both to render 3D video and for

scientific programming.

It is a massively parallel platform that can be used in many ways. Database

capabilities have been added, but even with these it is still best to not think of it as a

database product. The open-source nature of Hadoop allowed developers to try it, and

this drove early popularity as discussed earlier in Chapter 4. Because it became popular,

many vendors began to exploit its capabilities, adding to it or linking it to their databases.

Hadoop has generated its own software ecosystem (Figure 5-5 ).

Figure 5-5. Hadoop conceptual framework

Search WWH ::

Custom Search

Home