Database Reference
In-Depth Information
Figure 6-20 . Fetching records from the employee_ext external table
Hive with Cassandra
With external table support it is possible to use Hive for data analytics over Cassandra
by using the Cassandra-specific storageHandler implementation. DataStax
( www.datastax.com ) provides commercial products such as DataStax Enterprise
(DSE) which provides seamless integration with tools such as Hive. We will explore
more about DSE offering in the “Apache Sqoop” section later in this chapter.
In this section, we will discuss integration of open source Apache Cassandra with
Hive. One open source implementation is the Cassandra-specific storage handler that is
available at https://github.com/tuplejump/cash . You may git clone or
download the zip source and build it locally for jars. Alternatively, you can find these
jars under jars folder as a source attachment.
For our example, we will create a table in Cassandra and create external tables over
Hive to explore the data inserted via Cassandra and Hive.
1.
First, let's create a twitter keyspace and twitterdata table:
create keyspace twitter with
replication={'class':'SimpleStrategy','replication_factor':2};
use twitter;
create table twitterdata(tweet_id timeuuid
primary key, body text, tweeted_by text);
2.
Let's insert a few records with the insert command:
insert into
twitterdata(tweet_id,body,tweeted_by)
values(now(),'my first tweet','@mevivs');
insert into
twitterdata(tweet_id,body,tweeted_by)
 
Search WWH ::




Custom Search