Databases Reference
In-Depth Information
4 835
5 820
Time taken: 24.908 seconds
You can include multiple aggregation functions, like count, sum, and average, in a single query as
long as they all operate on the same column. You are not allowed to run aggregation functions on
multiple columns in the same query.
To run aggregation at the map level, you could set hive.map.aggr to true and run a count query
as follows:
set hive.map.aggr=true;
SELECT COUNT(*) FROM ratings;
Hive QL also supports ordering of result sets in ascending and descending order using the ORDER BY
clause. To get all records from the movies tables ordered by movieid in descending order, you can
query as follows:
hive> SELECT * FROM movies
> ORDER BY movieid DESC;
Available for
download on
Wrox.com
hive_movielens.txt
Hive has another ordering facility. It's SORT BY , which is similar to ORDER BY in that it orders
records in ascending or descending order. However, unlike ORDER BY , SORT BY applies ordering on
a per-reducer basis. This means the fi nal result set may be partially ordered. All records managed by
the same reducer will be ordered but records across reducers will not be ordered.
Hive allows partitioning of data sets on the basis of a virtual column. You can distribute partitioned
data to separate reducers by using the DISTRIBUTE BY method. Data distributed to different
reducers can be sorted on a per-reducer basis. Shorthand for DISTRIBUTE BY and ORDER BY together
is CLUSTER BY .
Hive QL's SQL-like syntax and semantics is very inviting for developers who are familiar with
RDBMS and SQL and want to explore the world of large data processing with Hadoop using
familiar tools. SQL developers who start exploring Hive soon start craving their power tool: the
SQL join. Hive doesn't disappoint even in this facility. Hive QL supports joins.
JOIN(S) IN HIVE QL
Hive supports equality joins, outer joins, and left semi-joins. To get a list of movie ratings with
movie titles you can obtain the result set by joining the ratings and the movies tables. You can
query as follows:
hive> SELECT ratings.userid, ratings.rating, ratings.tstamp, movies.title
> FROM ratings JOIN movies
> ON (ratings.movieid = movies.movieid)
> LIMIT 5;
Search WWH ::




Custom Search