Database Reference
In-Depth Information
that can be compiled and used in queries. Due to its popularity, there is a large eco-
system of tools available for Hive, including command-line tools, the Hive Web inter-
face, and various connectors such as JDBC drivers that can be used to provide access
from external software.
Hive is not the only distributed data warehousing solution. The AmpLab Spark
project extends the Hive codebase to operate over data using the Spark distributed
processing engine. Shark's in-memory model enables queries to return results expo-
nentially faster than a typical Hive query. Shark can be used in conjunction with exist-
ing Hadoop clusters. Although Shark is relatively new, it is becoming more popular as
a replacement for Hive when interactive ad hoc querying is necessary.
Hive is a popular choice for users who need to ask questions about datasets that are
too large to be handled by relational databases or are relatively unstructured. Hive
is also useful for datasets that are constantly growing, as it scales well across many
machines, a situation in which other approaches may be economically challenging. In
addition, Hive makes a great complementary tool to existing Hadoop installations,
providing nondeveloper analysts access to data that would otherwise require compli-
cated code to query.
 
Search WWH ::




Custom Search