Extending Splunk - Implementing Splunk: Big Data Reporting and Development for Operational Intelligence

Databases Reference

In-Depth Information

The performance issues introduced by external commands come from the following

two places:

• The work involved with launching a Python process, exporting events as

CSV to the Python process, and then importing the results back into the

Splunk process.

• The actual code of the command. A command that queries some external

data source, for instance a database, will be affected by the speed of that

external source.

In my testing, I could not make a command run faster than the speed that is

50 percent slower than native commands. To test this, let's try a couple of searches

as follows:

* | head 100000 | eval t=_time+1 | stats dc(t)

On my laptop, this query takes roughly four seconds to execute, when run on the

command line with preview disabled, as shown in the following code:

# time /opt/splunk/bin/splunk search '* | head 100000 | eval t=_time+1

| stats dc(t)' -preview false

Now let's throw in a command included in our sample app:

* | head 100000 | echo | eval t=_time+1 | stats dc(t)

This increases the search time to slightly over six seconds, an increase of 50

percent. Included in the sample app are three variations on the echo app of

varying complexity:

• echo : This command simply echoes the standard input to standard output.

• echo_csv : This command uses csvreader and csvwriter .

• echo_splunk : This command uses the Python modules provided with

Splunk to gather the incoming events and then output the results. We will

use these Python modules for our example commands.

Using each of these commands, the times are nearly identical, which tells me most

of the time is spent shuttling the events in and out of Splunk.

Adding required_fields=_time in commands.conf lowered

times from 2.5x to 1.5x in this case. If you know the fields your

command needs, this setting can dramatically increase performance.

Search WWH ::

Custom Search

Home