Database Reference
In-Depth Information
Of these three files, the indexes.conf file provides Hunk with the means to connect to the Hadoop cluster. For
example, to create a provider entry, I use a sequence similar to the following:
[hadoop@hc2nn local]$ cat indexes.conf
[provider:cdh5]
vix.family = hadoop
vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-hy2.0.jar
vix.env.HADOOP_HOME = /usr/lib/hadoop
vix.env.JAVA_HOME = /usr/lib/jvm/jre-1.6.0-openjdk.x86_64
vix.fs.default.name = hdfs://hc2nn:8020
vix.splunk.home.hdfs = /user/hadoop/hunk/workdir
vix.mapreduce.framework.name = yarn
vix.yarn.resourcemanager.address = hc2nn:8032
vix.yarn.resourcemanager.scheduler.address = hc2nn:8030
vix.mapred.job.map.memory.mb = 1024
vix.yarn.app.mapreduce.am.staging-dir = /user
vix.splunk.search.recordreader.csv.regex = \.txt$
This entry creates a provider entry called cdh5, which describes the means by which Hunk can connect to HDFS,
the Resource Manager, and the Scheduler. The entry describes where Hadoop is installed (via HADOOP_HOME ) and the
source of Java (via JAVA_HOME ). It specifies HDFS access via the local host name and name node port of 8020. Resource
Manager access will be at port 8032, and Scheduler access is at port 8030. The framework is described as YARN, and
the location on HDFS that Hunk can use as a working directory is described via the property vix.splunk.home.hdfs .
The second file, props.conf, describes the location on HDFS of a data source that is stored under /data/hunk/
rdbms/. The first cat command dumps the contents of the file, and the extractcsv value refers to an entry in the file
tranforms.conf that describes the contents of the data file:
[hadoop@hc2nn local]$ cat props.conf
[source::/data/hunk/rdbms/...]
REPORT-csvreport = extractcsv
The third file, transforms.conf, contains an entry called extractcsv , which is referenced in the props.conf
file above. It has two properties: the DELIMS value describes how the data line fields are delimited (in this case, by
commas); and the FIELDS property describes 14 fields of vehicle fuel-consumption data. This is the same fuel-
consumption data that was sourced in Chapter 4, where it was used to create an Oozie workflow.
[hadoop@hc2nn local]$ cat transforms.conf
[extractcsv]
DELIMS="\,"
FIELDS="year","manufacturer","model","class","engine size","cyclinders","transmission","Fuel
Type","fuel_city_l_100km","fuel_hwy_l_100km","fuel_city_mpg","fuel_hwy_mpg","fuel_l_yr","c02_g_km"
 
Search WWH ::




Custom Search