Advanced Analytics—Technology and Tools: MapReduce and Hadoop - Data Science and Big Data Analytics

Database Reference

In-Depth Information

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

Starting Job = job_1394125045435_0001, Tracking URL =

http://pivhdsne:8088/proxy/application_1394125045435_0001/

Kill Command = /usr/lib/gphd/hadoop/bin/hadoop job

-kill job_1394125045435_0001

Hadoop job information for Stage-1: number of mappers: 1;

number of reducers: 1

2014-03-06 12:30:23,542 Stage-1 map = 0%, reduce = 0%

2014-03-06 12:30:36,586 Stage-1 map = 100%, reduce = 0%,

Cumulative CPU 1.71 sec

2014-03-06 12:30:48,500 Stage-1 map = 100%, reduce = 100%,

Cumulative CPU 3.76 sec

MapReduce Total cumulative CPU time: 3 seconds 760 msec

Ended Job = job_1394125045435_0001

MapReduce Jobs Launched:

Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.76 sec HDFS Read:

242

HDFS Write: 2 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds 760 msec

OK

0

When querying large tables, Hive outperforms and scales better than most

conventional database queries. As stated earlier, Hive translates HiveQL queries

into MapReduce jobs that process pieces of large datasets in parallel.

To load the customer table with the contents of HDFS file, customer.txt , it is

only necessary to provide the HDFS directory path to the file.

hive> load data inpath '/user/customer.txt' into table

customer;

The following query displays three rows from the customer table.

hive> select * from customer limit 3;

34567678 Mary Jones mary.jones@isp.com

897572388 Harry Schmidt harry.schmidt@isp.com

89976576 Tom Smith thomas.smith@another_isp.com

Search WWH ::

Custom Search

Home