Database and Data Management - Field Guide to Hadoop

Database Reference

In-Depth Information

LOAD DATA LOCAL INPATH 'reviews.csv'

OVERWRITE INTO TABLE movie_reviews

Now we are ready to perform some sort of analysis. Let's say, in this case, we want to find

the average rating for the movie Dune :

Select AVG(rating) FROM movie_reviews WHERE title = 'Dune';

Spark SQL (formerly Shark)

License

Apache License, Version 2.0

Activity

High

Purpose

SQL access to Hadoop Data

Official Page

Hadoop Integration API Compatible

If you need SQL access to your data, and Hive (described here ) is a bit underperforming, and

you're willing to commit to a Spark environment (described here ) , then you need to consider

Spark SQL. SQL access in Spark was originally called the Shark project, and was a port of

Hive, but Shark has ceased development and its successor, Spark SQL, is now the mainline

SQL project on Spark. The blog post “Shark, Spark SQL, Hive on Spark, and the Future of

SQL on Spark” provides more information about the change. Spark SQL, like Spark, has an

in-memory computing model, which helps to account for its speed. It's only in recent years

that decreasing memory costs have made large memory Linux servers ubiquitous, thus lead-

ing to recent advances in in-memory computing for large datasets. Because memory access

times are usually 100 times as fast as disk access times, it's quite appealing to keep as much

Search WWH ::

Custom Search

Home