Database Reference
In-Depth Information
6.
Select * from twitterdata;
Figure 6-13
.
The console depicts the output of data persisted in the twitterdata table
You must have noticed that total tweet count is 10,020, but still
twitterdata
in
Cassandra got populated with 11 rows. The reason is that multiple tweets are present
for the same screen name, and it is defined as the primary key in the
twitterdata
table.
In the real world, it is certainly possible to have a nonunique field in data files and
the user may want Cassandra to take care of the unique key part. We can solve this
problem by defining an implicit primary key of type
timeuuid
and populate it ex-
ternal to original data file. In the next exercise, we will try to solve this problem by in-
troducing a unique primary key.
Loading Sata with timeuuid
Pig Latin does not have any direct support for Cassandra data types such as
tps://github.com/jeromatron/pygmalion
, which provides a number of
Pig utilities for Cassandra.