Database Reference
In-Depth Information
LOAD DATA LOCAL INPATH 'reviews.csv'
OVERWRITE INTO TABLE movie_reviews
Now we are ready to perform some sort of analysis. Let's say, in this case, we want to find
the average rating for the movie Dune :
Select AVG(rating) FROM movie_reviews WHERE title = 'Dune';
Spark SQL (formerly Shark)
License
Apache License, Version 2.0
Activity
High
Purpose
SQL access to Hadoop Data
Official Page
http://spark.apache.org/sql/
Hadoop Integration API Compatible
If you need SQL access to your data, and Hive (described here ) is a bit underperforming, and
you're willing to commit to a Spark environment (described here ) , then you need to consider
Spark SQL. SQL access in Spark was originally called the Shark project, and was a port of
Hive, but Shark has ceased development and its successor, Spark SQL, is now the mainline
SQL project on Spark. The blog post “Shark, Spark SQL, Hive on Spark, and the Future of
SQL on Spark” provides more information about the change. Spark SQL, like Spark, has an
in-memory computing model, which helps to account for its speed. It's only in recent years
that decreasing memory costs have made large memory Linux servers ubiquitous, thus lead-
ing to recent advances in in-memory computing for large datasets. Because memory access
times are usually 100 times as fast as disk access times, it's quite appealing to keep as much
Search WWH ::




Custom Search