Database Reference
In-Depth Information
NOTE
Hive (like MySQL and Oracle) allows you to list the join tables in the FROM clause and specify the join
condition in the WHERE clause of a SELECT statement. For example, the following is another way of ex-
pressing the query we just saw:
SELECT sales.*, things.*
FROM sales, things
WHERE sales.id = things.id;
A single join is implemented as a single MapReduce job, but multiple joins can be per-
formed in less than one MapReduce job per join if the same column is used in the join
condition. [ 117 ] You can see how many MapReduce jobs Hive will use for any particular
query by prefixing it with the EXPLAIN keyword:
EXPLAIN
SELECT sales.*, things.*
FROM sales JOIN things ON (sales.id = things.id);
The EXPLAIN output includes many details about the execution plan for the query, in-
cluding the abstract syntax tree, the dependency graph for the stages that Hive will ex-
ecute, and information about each stage. Stages may be MapReduce jobs or operations
such as file moves. For even more detail, prefix the query with EXPLAIN EXTENDED .
Hive currently uses a rule-based query optimizer for determining how to execute a query,
but a cost-based optimizer is available from Hive 0.14.0.
Outer joins
Outer joins allow you to find nonmatches in the tables being joined. In the current ex-
ample, when we performed an inner join, the row for Ali did not appear in the output, be-
cause the ID of the item she purchased was not present in the things table. If we change
the join type to LEFT OUTER JOIN , the query will return a row for every row in the left
table ( sales ), even if there is no corresponding row in the table it is being joined to
( things ):
hive> SELECT sales.*, things.*
> FROM sales LEFT OUTER JOIN things ON (sales.id = things.id);
Joe 2 2 Tie
Hank 4 4 Coat
Ali 0 NULL NULL
Eve 3 3 Hat
Hank 2 2 Tie
Search WWH ::




Custom Search