Adding Structure with Hive - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Third-Party SerDes

Third-party SerDes are available for Hive as well. Examples include

CSVSerde ( https://github.com/ogrodnek/csv-serde ), which

handles CSV files with embedded quotes and delimiters, and a JSON

SerDe ( https://github.com/cloudera/

main/java/com/cloudera/hive/serde/JSONSerDe.java ) ,

which will parse records stored as JSON objects.

Hive has robust support for both standard and complex data types, stored

in a wide variety of formats. And as highlighted in the preceding section,

if support for a particular file format is not included, it can be added via

third-party add-ons or custom implementations. This works very well with

the type of data that is often found in Hadoop data stores. By using Hive's

abilitytoapplyatabularstructuretothedata,itmakesiteasierforusersand

tools to consume. But there is another component to making access much

easier for existing tools, which is discussed next.

Enabling Data Access and Transformation

Traditional users of data warehouses expect to be able to query and

transform the data. They use SQL for this. They run this SQL through

applications that use common middleware software to provide a standard

interface to the data. Most RDBMS systems implement support for one or

more of these middleware interfaces. Open Database Connectivity (ODBC)

is a common piece of software for this and has been around since the early

1990s. Other common interfaces include the following:

• ADO.NET (used by Microsoft .NET-based applications)

• OLE DB

• Java Database Connectivity (JDBC)

ODBC, being one of the original interfaces for this, is well supported by

existing applications, and many of the other interfaces provide bridges for

ODBC.

Search WWH ::

Custom Search

Home