Analyzing Big Data with Hive - Professional NoSQL

Databases Reference

In-Depth Information

If you need additional information on physical fi les include EXTENDED between EXPLAIN and the query.

Next, a simple use case of data partitioning is shown.

Partitioned Table

Partitioning a table enables you to segregate data into multiple namespaces and fi lter and query the

data set based on the namespace identifi ers. Say a data analyst believed that ratings were impacted

when the user submitted them and wanted to split the ratings into two partitions, one for all ratings

submitted between 8 p.m. and 8 a.m. and the other for the rest of the day. You could create a virtual

column to identify this partition and save the data as such.

Then you would be able to fi lter, search, and cluster on the basis of these namespaces.

SUMMARY

This chapter tersely depicted the power and fl exibility of Hive. It showed how the old goodness of

SQL can be combined with the power of Hadoop to deliver a compelling data analysis tool, one that

both traditional RDBMS developers and new big data pioneers can use.

Hive was built at Facebook and was open sourced as a subproject of Hadoop. Now a top-level

project, Hive continues to evolve rapidly, bridging the gap between the SQL and the NoSQL worlds.

Prior to Hive's release as open source, Hadoop was arguably useful only to a subset of developers in

any given group needing to access “big data” in their organization. Some say Hive nullifi es the use

of the buzzword, NoSQL, the topic of this topic. It almost makes some forcefully claim that NoSQL

is actually an acronym that expands out to Not Only SQL.

Search WWH ::

Custom Search

Home