Databases Reference
In-Depth Information
The performance issues introduced by external commands come from the following
two places:
• The work involved with launching a Python process, exporting events as
CSV to the Python process, and then importing the results back into the
Splunk process.
• The actual code of the command. A command that queries some external
data source, for instance a database, will be affected by the speed of that
external source.
In my testing, I could not make a command run faster than the speed that is
50 percent slower than native commands. To test this, let's try a couple of searches
as follows:
* | head 100000 | eval t=_time+1 | stats dc(t)
On my laptop, this query takes roughly four seconds to execute, when run on the
command line with preview disabled, as shown in the following code:
# time /opt/splunk/bin/splunk search '* | head 100000 | eval t=_time+1
| stats dc(t)' -preview false
Now let's throw in a command included in our sample app:
* | head 100000 | echo | eval t=_time+1 | stats dc(t)
This increases the search time to slightly over six seconds, an increase of 50
percent. Included in the sample app are three variations on the echo app of
varying complexity:
echo : This command simply echoes the standard input to standard output.
echo_csv : This command uses csvreader and csvwriter .
echo_splunk : This command uses the Python modules provided with
Splunk to gather the incoming events and then output the results. We will
use these Python modules for our example commands.
Using each of these commands, the times are nearly identical, which tells me most
of the time is spent shuttling the events in and out of Splunk.
Adding required_fields=_time in commands.conf lowered
times from 2.5x to 1.5x in this case. If you know the fields your
command needs, this setting can dramatically increase performance.
 
Search WWH ::




Custom Search