Database Reference
In-Depth Information
HCatalog
So, what is HCatalog? Simply put, HCatalog provides a tabular abstraction
of the HDFS files stored in Hadoop. A number of tools then leverage this
abstraction when working with the data. Pig, Hive, and MapReduce all
use this abstraction to reduce the complexity and overhead of reading and
writing data to Hadoop.
HDFS files can, in theory, be in any format, and the data blocks can be
placed anywhere on the cluster. HCatalog provides the mechanism for
mapping both the file formats and data locations to the tabular view of the
data. Again, HCatalog is open and extensible to allow for the fact that some
file formats may be proprietary. Additional coding would be required, but
the fact that a file format in HDFS was previously unknown would not be a
blocker to using HCatalog.
Apache HCatalog is technically no longer a Hadoop project. It is still an
important feature, but its codebase was merged with the Hive Project early
in 2013. HCatalog is built on top of the Hive and leverages its command-line
interface for issuing commands against the HCatalog.
One way to think about HCatalog is as the master database for Hive. In that
sense, HCatalog provides the catalog views and interfaces for your Hadoop
“database.”
HBase
HBase is an interesting project because it provides NoSQL database
functionality on top of HDFS. It is also a column store, providing fast access
to large quantities of data, which is often sparsely populated. HBase also
offers transactional support to Hadoop, enabling a level of Data
Modification Language (DML) (that is, inserts, updates, and deletes).
However, HBase does not offer a SQL interface; remember, it is part of the
NoSQL family. It also does not offer a number of other RDBMS features,
such as typed columns, security, enhanced data programmability features,
and querying languages.
HBase is designed to work with large tables, but you are unlikely to ever see
a table like this in an RDBMS (not even in a SharePoint database). HBase
tables can have billions of rows, which is not uncommon these days; but in
conjunction with that, those rows can have an almost limitless number of
Search WWH ::




Custom Search