Integration with Hadoop - Mastering Apache Cassandra

Database Reference

In-Depth Information

The jps is a built-in tool provided by the Oracle JDK. It lists all the Java processes run-

ning on the machine. The previous snippet shows that all the Hadoop processes are up.

Let's execute an example and see whether things are actually working:

# Upload everything under conf directory to "in" directory

in HDFS

$ bin/hdfs dfs -mkdir /user

$ bin/hdfs dfs -mkdir /user/naishe

$ bin/hdfs dfs -put etc/hadoop input

$ bin/hdfs dfs -ls input

Found 29 items

-rw-r--r-- 1 naishe supergroup 4436 2015-01-20

23:20 input/capacity-scheduler.xml

-rw-r--r-- 1 naishe supergroup 1335 2015-01-20

23:20 input/configuration.xsl

-rw-r--r-- 1 naishe supergroup 318 2015-01-20

23:20 input/container-executor.cfg

[ -- snip -- ]

-rw-r--r-- 1 naishe supergroup 690 2015-01-20

23:20 input/yarn-site

.xml

All set, time to execute an example on it. We will run an example that grabs all the words

that match the dfs[a-z.]+ regular expression across all the files under the in folder

and returns the counts in a folder called out .

# Executegrep example

$ bin/hadoop jar hadoop-examples-*.jar grep in out

'dfs[a-z.]+'

15/01/20 23:27:10 INFO Configuration.deprecation:

session.id is deprecated. Instead, use

dfs.metrics.session-id

15/01/20 23:27:10 INFO jvm.JvmMetrics: Initializing JVM

Metrics with processName=JobTracker, sessionId=

15/01/20 23:27:10 WARN mapreduce.JobSubmitter: No job jar

file set. User classes may not be found. See Job or

Job#setJar(String).

15/01/20 23:27:11 INFO input.FileInputFormat: Total input

paths to process : 29

Search WWH ::

Custom Search

Home