Database Reference
In-Depth Information
6.
At last, connect to the cql shell and verify the loaded data (see Figure
6-13 ) :
Select * from twitterdata;
Figure 6-13 . The console depicts the output of data persisted in the twitterdata table
You must have noticed that total tweet count is 10,020, but still twitterdata in
Cassandra got populated with 11 rows. The reason is that multiple tweets are present
for the same screen name, and it is defined as the primary key in the twitterdata
table.
In the real world, it is certainly possible to have a nonunique field in data files and
the user may want Cassandra to take care of the unique key part. We can solve this
problem by defining an implicit primary key of type timeuuid and populate it ex-
ternal to original data file. In the next exercise, we will try to solve this problem by in-
troducing a unique primary key.
Loading Sata with timeuuid
Pig Latin does not have any direct support for Cassandra data types such as
timeuuid and uuid . But there is an open source project called Pygmalion ht-
tps://github.com/jeromatron/pygmalion , which provides a number of
Pig utilities for Cassandra.
 
 
Search WWH ::




Custom Search