Database Reference
In-Depth Information
)
LOCATION 'user/MyCustomerTable';
You can the LOCATION option with managed tables, as well, but it's not
necessary unless you want a table that Hive manages that is also stored in
a directory that Hive doesn't manage. For clarity, it's recommended that
LOCATION be used only with external tables.
WARNING
Be aware that, regardless of whether the table is managed or external,
the data is still accessible through the Hadoop file system. Files can be
added or deleted by anyone with access to Hadoop. So, even for
managed tables, Hive doesn't really take full control of the data files.
Adding and Deleting Data
Remember from the earlier discussion about differences between Hive and
relation systems that Hive uses Hadoop for storage, so it does not support
row-level operations. You can't insert, update, or delete individual rows.
However, because Hive is designed for big data, you would want to perform
bulk operations in any case, so this isn't a significant restriction.
Perhaps the simplest way to add data to a Hive table is to write or copy
a properly formatted file to the table's directory directly, using HDFS.
(Commands for copying files directly in HDFS are covered in Chapter 5,
“Storing and Managing Data in HDFS.”)
You can load data from existing files into a table using the LOAD DATA
command.Thisissimilartousinga BULK INSERT statementinSQLServer.
All the data in the specified location will be loaded into the table. However,
in SQL Server, BULK INSERT references a single data file. LOAD DATA
is usually pointed at a directory, so that all files in the directory can be
imported. Another important difference is that, while SQL Server verifies
the data in a bulk load, Hive only verifies that the file format matches the
table definition. It does not check that the record format matches what has
been specified for the table:
Search WWH ::




Custom Search