Database Reference
In-Depth Information
Now, the DateConv Hive UDF function is ready to be used in a Hive QL query by using just its short name,
DateConv. It will take a date with a structure of dd-MM-yyyy and return a date formatted to yyyy-MM-dd for use in a
Hive table. Next, I show how to invoke this function.
Table Creation
Suppose, as the next step in your ETL chain, you want to extract monthly purchase totals by supplier. You could, as
shown earlier, create a table with CREATE TABLE , using the IF NOT EXISTS clause to prevent duplication; for example,
I create the following table:
CREATE TABLE IF NOT EXISTS
trade.suppliertot
(
payyear INT,
paymonth INT,
supplier STRING,
totamount DOUBLE
);
Called “suppliertot” and residing in the trade database, this table would have four columns: one for the year of a
transaction, one for the month, one for the the supplier name, and one for the total amount of the purchases from that
supplier (indicated by the comma-separated list within parentheses). The table would not, however, contain any data
yet, as it has no LOCATION clause to link it to an HDFS directory.
Alternatively, I could create the table via a SELECT statement, which would automatically populate it with data. To
do that, I drop the table first and then run the CREATE statement, as follows:
DROP TABLE trade.suppliertot;
CREATE TABLE IF NOT EXISTS
trade.suppliertot
AS
SELECT
year(DateConv (paydate) ) as payyear,
month(DateConv (paydate) ) as paymonth,
supplier,
SUM(amount) as totamount
FROM
trade.rawtrans
GROUP BY
year(DateConv (paydate) ) ,
month(DateConv (paydate) ) ,
supplier ;
This statement uses the DateConv UDF that was created previously to convert date strings with the format dd/
MM/yyyy to the format yyyy-MM-dd; here, it is used against the “paydate” column. I use the same CREATE TABLE IF
NOT EXISTS option to create the table trade.suppliertot. However, I replace the list of columns in parentheses with an
AS SELECT statement that has four columns that takes data from the table trade.rawtrans via a FROM clause.
I can confirm that this second table now contains the data by issuing the statement SELECT COUNT(*) FROM
trade.suppliertot . This returns the result 391, indicating that the trade.suppliertot table contains 391 rows. The
COUNT(*) is a special aggregation function that returns the number of rows in a table.
 
Search WWH ::




Custom Search