Database Reference
In-Depth Information
Further Reading
For more information about Hive, see
Programming Hive
by Edward Capriolo, Dean
Wampler, and Jason Rutherglen (O'Reilly, 2012).
[
106
]
Toby Segaran and Jeff Hammerbacher,
Beautiful Data: The Stories Behind Elegant Data Solutions
(O'Reilly, 2009).
[
107
]
It is assumed that you have network connectivity from your workstation to the Hadoop cluster. You can
test this before running Hive by installing Hadoop locally and performing some HDFS operations with the
hadoop fs
command.
[
108
]
The properties have the
javax.jdo
prefix because the metastore implementation uses the Java Data
Objects (JDO) API for persisting Java objects. Specifically, it uses the DataNucleus implementation of JDO.
[
109
]
Or see the
Hive function reference
.
[
110
]
The move will succeed only if the source and target filesystems are the same. Also, there is a special
case when the
LOCAL
keyword is used, where Hive will copy the data from the local filesystem into Hive's
warehouse directory (even if it, too, is on the same local filesystem). In all other cases, though,
LOAD
is a
move operation and is best thought of as such.
[
112
]
However, partitions may be added to or removed from a table after creation using an
ALTER TABLE
statement.
[
113
]
The fields appear to run together when displaying the raw file because the separator character in the out-
put is a nonprinting control character. The control characters used are explained in the next section.
[
115
]
Sometimes you need to use parentheses for regular expression constructs that you don't want to count as
a capturing group — for example, the pattern
(ab)+
for matching a string of one or more
ab
characters. The
solution is to use a noncapturing group, which has a
?
character after the first parenthesis. There are various
noncapturing group constructs (see the Java documentation), but in this example we could use
(?:ab)+
to
avoid capturing the group as a Hive column.
[
116
]
This is a reworking in Hive of the discussion in
Secondary Sort
.
[
117
]
The order of the tables in the
JOIN
clauses is significant. It's generally best to have the largest table
last, but see the
Hive wiki
for more details, including how to give hints to the Hive planner.