Data Migration and Analytics - Beginning Apache Cassandra Development

Database Reference

In-Depth Information

Figure 6-1 shows a MapReduce job execution of counting tweets by date on the

business layer (programmatically) and storing the output on a database. Here the

MapReduce algorithm implementation is on the application level. Such implementa-

tions can be Java or any other language.

Figure 6-1 . Image depicting standard implementation of tweet analytics at business layer

Another implementation for the same use case can be that backend programmers or

data scientists prefer to write MapReduce scripts or functions on the database side, as

shown in Figure 6-2 .

Figure 6-2 . An image depicting implementation of tweet analytics at the database layer

Developing such MapReduce programs for almost everything is definitely not

maintainable and is time-consuming, as well. That's where an SQL-like language or

writing a simple script can be really helpful. Database administrators or scientists

might prefer a database-based approach ( Figure 6-2 ) for quick implementation. With

this we can conclude that there is a space for requirement of such tools. For example,

these tools come in very handy to implement a scheduled job which requires extraction

over large data sets and stores output.

This comes with a question, are there any tools or open source libraries that are

available as a ready-to-use solution. Tools like Apache Hive, Apache Pig, and Sqoop

Search WWH ::

Custom Search

Home