Hive - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Further Reading

For more information about Hive, see Programming Hive by Edward Capriolo, Dean

Wampler, and Jason Rutherglen (O'Reilly, 2012).

[ 106 ] Toby Segaran and Jeff Hammerbacher, Beautiful Data: The Stories Behind Elegant Data Solutions

(O'Reilly, 2009).

[ 107 ] It is assumed that you have network connectivity from your workstation to the Hadoop cluster. You can

test this before running Hive by installing Hadoop locally and performing some HDFS operations with the

hadoop fs command.

[ 108 ] The properties have the javax.jdo prefix because the metastore implementation uses the Java Data

Objects (JDO) API for persisting Java objects. Specifically, it uses the DataNucleus implementation of JDO.

[ 109 ] Or see the Hive function reference .

[ 110 ] The move will succeed only if the source and target filesystems are the same. Also, there is a special

case when the LOCAL keyword is used, where Hive will copy the data from the local filesystem into Hive's

warehouse directory (even if it, too, is on the same local filesystem). In all other cases, though, LOAD is a

move operation and is best thought of as such.

[ 111 ] You can also use INSERT OVERWRITE DIRECTORY to export data to a Hadoop filesystem.

[ 112 ] However, partitions may be added to or removed from a table after creation using an ALTER TABLE

statement.

[ 113 ] The fields appear to run together when displaying the raw file because the separator character in the out-

put is a nonprinting control character. The control characters used are explained in the next section.

[ 114 ] The default format can be changed by setting the property hive.default.fileformat .

[ 115 ] Sometimes you need to use parentheses for regular expression constructs that you don't want to count as

a capturing group — for example, the pattern (ab)+ for matching a string of one or more ab characters. The

solution is to use a noncapturing group, which has a ? character after the first parenthesis. There are various

noncapturing group constructs (see the Java documentation), but in this example we could use (?:ab)+ to

avoid capturing the group as a Hive column.

[ 116 ] This is a reworking in Hive of the discussion in Secondary Sort .

[ 117 ] The order of the tables in the JOIN clauses is significant. It's generally best to have the largest table

last, but see the Hive wiki for more details, including how to give hints to the Hive planner.

Search WWH ::

Custom Search

Home