Beyond MapReduce - Enterprise Data Workflows with Cascading

Databases Reference

In-Depth Information

$ export PATH = ~/.lingual-client/bin/: $PATH

When using Lingual with Apache Hadoop, the SQL command shell expects certain

environment variables to be set. That way the correct Hadoop version and configuration

will be included in the CLASSPATH :

HADOOP_HOME

Path to local Hadoop installation

HADOOP_CONF_DIR

Defaults to $HADOOP_HOME/conf

HADOOP_USER_NAME

The username to use when submitting Hadoop jobs

Assuming that you have HADOOP_HOME already set, then:

$ export HADOOP_CONF_DIR = $HADOOP_HOME /conf

$ export HADOOP_USER_NAME = <username>

If you're working with a remote Elastic MapReduce cluster on Amazon AWS, see the

Bash EMR utilities. Specifically, use the emrconf command to fetch remote configura‐

tion files.

If you encounter errors executing SQL queries on a remote cluster (Amazon AWS,

Windows Azure HDInsight, etc.) try the following workaround:

$ export HADOOP_USER_NAME = hadoop

That should resolve security issues that may be causing failures on the remote cluster.

Now let's try using the Lingual SQL command shell. The following example is based on

data from the MySQL Sample Employee Database :

$ mkdir -p ~/src/lingual

$ cd ~/src/lingual

$ curl http://data.cascading.org/employees.tgz | tar xvz

That creates an employees subdirectory for the table data, which is essentially several

large CSV files. Next, load the schema for these tables into Lingual using SQL data

definitions:

$ curl http://data.cascading.org/create-employees.sh > create-employees.sh

$ chmod +x ./create-employees.sh

$ ./create-employees.sh local

Now try the SQL command line, querying to show a relational catalog for these tables:

$ lingual shell

0: jdbc:lingual:local> !tables

That lists metadata about the available tables: EMPLOYEE , TITLES , SALARIES . Next, let's

try a simple query:

Search WWH ::

Custom Search

Home