Database Reference
In-Depth Information
CREATE VIEW max_temperatures (station, year, max_temperature)
AS
SELECT station, year, MAX(temperature)
FROM valid_records
GROUP BY station, year;
In this view definition, we list the column names explicitly. We do this because the max-
imum temperature column is an aggregate expression, and otherwise Hive would create a
column alias for us (such as _c2 ). We could equally well have used an AS clause in the
SELECT to name the column.
With the views in place, we can now use them by running a query:
SELECT station, year, AVG(max_temperature)
FROM max_temperatures
GROUP BY station, year;
The result of the query is the same as that of running the one that uses a subquery. In par-
ticular, Hive creates the same number of MapReduce jobs for both: two in each case, one
for each GROUP BY . This example shows that Hive can combine a query on a view into a
sequence of jobs that is equivalent to writing the query without using a view. In other
words, Hive won't needlessly materialize a view, even at execution time.
Views in Hive are read-only, so there is no way to load or insert data into an underlying
base table via a view.
Search WWH ::




Custom Search