Configuring Splunk - Implementing Splunk: Big Data Reporting and Development for Operational Intelligence

Databases Reference

In-Depth Information

Splunk will continually scan all directories from the first wildcard

in a monitor path!

If /opt contains many files and directories, which it almost certainly does, Splunk

will use an unfortunate amount of resources scanning all directories for matching

files, constantly using memory and CPU. I have seen a single Splunk process

watching a large directory structure use 2 gigabytes of memory. A little creativity

can take care of this, but it is something to be aware of.

The takeaway is that if you know the possible values for * , you are better off writing

multiple stanzas. For instance, assuming our directories in /opt are A and B , the

following stanzas will be far more efficient:

[monitor:///opt/A/logs/access.log*]

sourcetype=access

[monitor:///opt/B/logs/access.log*]

sourcetype=access

It is also perfectly acceptable to have stanzas matching files and directories that

simply don't exist. This causes no errors, but be careful to not include patterns

that are so broad that they match unintended files.

Following symbolic links

When scanning directories recursively, the default behavior is to follow symbolic

links. Often this is very useful, but it can cause problems if a symbolic link points

to a large or slow file system. To control this behavior, simply set:

followSymlink = false

It's probably a good idea to put this on all of your monitor stanzas until you know

you need to follow a symbolic link.

Setting the value of host from source

The default behavior of using the hostname from the machine forwarding the logs

is almost always what you want. If, however, you are reading logs for a number

of hosts, you can extract the hostname from source using host_regex or host_

segment . For instance, say we have the path:

/nfs/logs/webserver1/access.log

Search WWH ::

Custom Search

Home