Advanced Deployments - Implementing Splunk: Big Data Reporting and Development for Operational Intelligence

Databases Reference

In-Depth Information

• Splunk relies on the modification time to determine whether the new

events have been written to a file. File metadata may not be updated

as quickly on a share.

• A large directory structure will cause the Splunk process reading logs

to use a lot of RAM and a large percentage of the CPU. A process to move

old logs away would be advisable so as to minimize the number of files

Splunk must track.

This setup often looks like the following figure:

This configuration may look simple, but unfortunately, it does not scale easily.

Consuming logs in batch

Another less common approach is to gather logs periodically from servers after the

logs have rolled. This is very similar to monitoring logs on a shared drive, except

that the problems of scale are possibly even worse.

The advantages of this approach include:

• A forwarder does not need to be installed on each server that is writing its

logs to the share.

The disadvantages of this approach include:

• When new logs are dropped, if the files are large, the Splunk process will

only read events from one file at a time. When this directory is on an indexer,

this is fine, but when a forwarder is trying to distribute events across

multiple indexers, only one indexer will receive events at a time.

• The oldest events in the rolled log will not be loaded until the log is rolled

and copied.

• An active log cannot be copied, as events may be truncated during the copy

or Splunk may be confused and believe the update file is a new log, indexing

the entire file again.

Search WWH ::

Custom Search

Home