Adding Structure with Hive - Microsoft Big Data Solutions

Database Reference

In-Depth Information

)

LOCATION 'user/MyCustomerTable';

You can the LOCATION option with managed tables, as well, but it's not

necessary unless you want a table that Hive manages that is also stored in

a directory that Hive doesn't manage. For clarity, it's recommended that

LOCATION be used only with external tables.

WARNING

Be aware that, regardless of whether the table is managed or external,

the data is still accessible through the Hadoop file system. Files can be

added or deleted by anyone with access to Hadoop. So, even for

managed tables, Hive doesn't really take full control of the data files.

Adding and Deleting Data

Remember from the earlier discussion about differences between Hive and

relation systems that Hive uses Hadoop for storage, so it does not support

row-level operations. You can't insert, update, or delete individual rows.

However, because Hive is designed for big data, you would want to perform

bulk operations in any case, so this isn't a significant restriction.

Perhaps the simplest way to add data to a Hive table is to write or copy

a properly formatted file to the table's directory directly, using HDFS.

(Commands for copying files directly in HDFS are covered in Chapter 5,

“Storing and Managing Data in HDFS.”)

You can load data from existing files into a table using the LOAD DATA

command.Thisissimilartousinga BULK INSERT statementinSQLServer.

All the data in the specified location will be loaded into the table. However,

in SQL Server, BULK INSERT references a single data file. LOAD DATA

is usually pointed at a directory, so that all files in the directory can be

imported. Another important difference is that, while SQL Server verifies

the data in a bulk load, Hive only verifies that the file format matches the

table definition. It does not check that the record format matches what has

been specified for the table:

Search WWH ::

Custom Search

Home