Database Reference
In-Depth Information
The jps is a built-in tool provided by the Oracle JDK. It lists all the Java processes run-
ning on the machine. The previous snippet shows that all the Hadoop processes are up.
Let's execute an example and see whether things are actually working:
# Upload everything under conf directory to "in" directory
in HDFS
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/naishe
$ bin/hdfs dfs -put etc/hadoop input
$ bin/hdfs dfs -ls input
Found 29 items
-rw-r--r-- 1 naishe supergroup 4436 2015-01-20
23:20 input/capacity-scheduler.xml
-rw-r--r-- 1 naishe supergroup 1335 2015-01-20
23:20 input/configuration.xsl
-rw-r--r-- 1 naishe supergroup 318 2015-01-20
23:20 input/container-executor.cfg
[ -- snip -- ]
-rw-r--r-- 1 naishe supergroup 690 2015-01-20
23:20 input/yarn-site
.xml
All set, time to execute an example on it. We will run an example that grabs all the words
that match the dfs[a-z.]+ regular expression across all the files under the in folder
and returns the counts in a folder called out .
# Executegrep example
$ bin/hadoop jar hadoop-examples-*.jar grep in out
'dfs[a-z.]+'
15/01/20 23:27:10 INFO Configuration.deprecation:
session.id is deprecated. Instead, use
dfs.metrics.session-id
15/01/20 23:27:10 INFO jvm.JvmMetrics: Initializing JVM
Metrics with processName=JobTracker, sessionId=
15/01/20 23:27:10 WARN mapreduce.JobSubmitter: No job jar
file set. User classes may not be found. See Job or
Job#setJar(String).
15/01/20 23:27:11 INFO input.FileInputFormat: Total input
paths to process : 29
Search WWH ::




Custom Search