Databases Reference
In-Depth Information
There are a few fairly serious disadvantages to be aware of:
• The query using the summary index
must
use a subset of the functions and
split fields that were in the original populating query. If the subsequent
query strays from what is in the original
sistats
data, the results may
be unexpected and difficult to debug. For example:
° The following code works fine:
source="impl_splunk_gen"
| sitimechart span=1h avg(req_time) by user
| stats avg(req_time)
° The following code returns unpredictable and wildly incorrect
values:
source="impl_splunk_gen"
| sitimechart span=1h avg(req_time) by user
| stats max(req_time)
Notice that
avg
went into
sistats
, but we tried to calculate
max
from the results.
• Using
dc
(distinct count) with
sistats
can produce huge events. This
happens because to accurately determine unique values over slices of time,
all original values must be kept. One common use case is to find the top IP
addresses that hit a public facing server. See the
Calculating top for a large time
frame
section for alternate approaches to this problem.
• The contents of the summary index are quite difficult to read as they are not
meant to be used by humans.
To see how all of this works, let's build a few queries. We start with a simple
stats
query as follows:
sourcetype=impl_splunk_gen
| stats count max(req_time) avg(req_time) min(req_time) by user
This produces results like you would expect:
Search WWH ::
Custom Search