Summary Indexes and CSV Files - Implementing Splunk: Big Data Reporting and Development for Operational Intelligence

Databases Reference

In-Depth Information

There are a few fairly serious disadvantages to be aware of:

• The query using the summary index must use a subset of the functions and

split fields that were in the original populating query. If the subsequent

query strays from what is in the original sistats data, the results may

be unexpected and difficult to debug. For example:

° The following code works fine:

source="impl_splunk_gen"

| sitimechart span=1h avg(req_time) by user

| stats avg(req_time)

° The following code returns unpredictable and wildly incorrect

values:

source="impl_splunk_gen"

| sitimechart span=1h avg(req_time) by user

| stats max(req_time)

Notice that avg went into sistats , but we tried to calculate max

from the results.

• Using dc (distinct count) with sistats can produce huge events. This

happens because to accurately determine unique values over slices of time,

all original values must be kept. One common use case is to find the top IP

addresses that hit a public facing server. See the Calculating top for a large time

frame section for alternate approaches to this problem.

• The contents of the summary index are quite difficult to read as they are not

meant to be used by humans.

To see how all of this works, let's build a few queries. We start with a simple stats

query as follows:

sourcetype=impl_splunk_gen

| stats count max(req_time) avg(req_time) min(req_time) by user

This produces results like you would expect:

Search WWH ::

Custom Search

Home