Databases Reference
In-Depth Information
Splunk will continually scan all directories from the first wildcard
in a monitor path!
If /opt contains many files and directories, which it almost certainly does, Splunk
will use an unfortunate amount of resources scanning all directories for matching
files, constantly using memory and CPU. I have seen a single Splunk process
watching a large directory structure use 2 gigabytes of memory. A little creativity
can take care of this, but it is something to be aware of.
The takeaway is that if you know the possible values for * , you are better off writing
multiple stanzas. For instance, assuming our directories in /opt are A and B , the
following stanzas will be far more efficient:
[monitor:///opt/A/logs/access.log*]
sourcetype=access
[monitor:///opt/B/logs/access.log*]
sourcetype=access
It is also perfectly acceptable to have stanzas matching files and directories that
simply don't exist. This causes no errors, but be careful to not include patterns
that are so broad that they match unintended files.
Following symbolic links
When scanning directories recursively, the default behavior is to follow symbolic
links. Often this is very useful, but it can cause problems if a symbolic link points
to a large or slow file system. To control this behavior, simply set:
followSymlink = false
It's probably a good idea to put this on all of your monitor stanzas until you know
you need to follow a symbolic link.
Setting the value of host from source
The default behavior of using the hostname from the machine forwarding the logs
is almost always what you want. If, however, you are reading logs for a number
of hosts, you can extract the hostname from source using host_regex or host_
segment . For instance, say we have the path:
/nfs/logs/webserver1/access.log
 
Search WWH ::




Custom Search