Database Reference
In-Depth Information
Figure 6-20
.
Fetching records from the employee_ext external table
Hive with Cassandra
With external table support it is possible to use Hive for data analytics over Cassandra
by using the Cassandra-specific
storageHandler
implementation. DataStax
(DSE) which provides seamless integration with tools such as Hive. We will explore
more about DSE offering in the “Apache Sqoop” section later in this chapter.
In this section, we will discuss integration of open source Apache Cassandra with
Hive. One open source implementation is the Cassandra-specific storage handler that is
download the zip source and build it locally for jars. Alternatively, you can find these
jars under
jars
folder as a source attachment.
For our example, we will create a table in Cassandra and create external tables over
Hive to explore the data inserted via Cassandra and Hive.
1.
First, let's create a
twitter
keyspace and
twitterdata
table:
create keyspace twitter with
replication={'class':'SimpleStrategy','replication_factor':2};
use twitter;
create table twitterdata(tweet_id timeuuid
primary key, body text, tweeted_by text);
2.
Let's insert a few records with the
insert
command:
insert into
twitterdata(tweet_id,body,tweeted_by)
values(now(),'my first tweet','@mevivs');
insert into
twitterdata(tweet_id,body,tweeted_by)