Databases Reference
In-Depth Information
Using a summary index to store these interim values can sometimes be overkill
if those values are not needed for long. In the Calculating top for a large time frame
section, we ended up storing thousands of values every few minutes. If we simply
wanted to know the top 10 per day, this might be seen as a waste. To cut down on
the noise in our summary index, we can use a CSV as cheap interim storage.
The steps are essentially to:
1. Periodically query recent data and update the CSV.
2. Capture top values in summary at the end of the day.
3. Empty the CSV file.
Our periodic query looks like the following:
source="impl_splunk_gen"
| stats count by req_time
| append [inputcsv top_req_time.csv]
| stats sum(count) as count by req_time
| sort 10000 -count
| outputcsv top_req_time.csv
Let's break the query down line by line:
source="impl_splunk_gen" : This is the query to find the events for this
slice of time.
| stats count by req_time : This helps calculate the count by req_time .
| append [inputcsv top_req_time.csv] : This loads the results generated
so far from the CSV file, and adds the events to the end of our current results.
| stats sum(count) as count by req_time : This uses stats to combine
the results from our current time slice and the previous results.
| sort 10000 -count : This sorts the results descending by count . The
second word, 10000 , specifies that we want to keep the first 10,000 results.
| outputcsv top_req_time.csv : This overwrites the CSV file.
Schedule the query to run periodically, perhaps every 15 minutes. Follow the same
rules about latency as discussed in the How latency affects summary queries section.
 
Search WWH ::




Custom Search