Databases Reference
In-Depth Information
There are a few fairly serious disadvantages to be aware of:
• The query using the summary index must use a subset of the functions and
split fields that were in the original populating query. If the subsequent
query strays from what is in the original sistats data, the results may
be unexpected and difficult to debug. For example:
° The following code works fine:
source="impl_splunk_gen"
| sitimechart span=1h avg(req_time) by user
| stats avg(req_time)
° The following code returns unpredictable and wildly incorrect
values:
source="impl_splunk_gen"
| sitimechart span=1h avg(req_time) by user
| stats max(req_time)
Notice that avg went into sistats , but we tried to calculate max
from the results.
• Using dc (distinct count) with sistats can produce huge events. This
happens because to accurately determine unique values over slices of time,
all original values must be kept. One common use case is to find the top IP
addresses that hit a public facing server. See the Calculating top for a large time
frame section for alternate approaches to this problem.
• The contents of the summary index are quite difficult to read as they are not
meant to be used by humans.
To see how all of this works, let's build a few queries. We start with a simple stats
query as follows:
sourcetype=impl_splunk_gen
| stats count max(req_time) avg(req_time) min(req_time) by user
This produces results like you would expect:
 
Search WWH ::




Custom Search