Summary Indexes and CSV Files - Implementing Splunk: Big Data Reporting and Development for Operational Intelligence

Databases Reference

In-Depth Information

Using a summary index to store these interim values can sometimes be overkill

if those values are not needed for long. In the Calculating top for a large time frame

section, we ended up storing thousands of values every few minutes. If we simply

wanted to know the top 10 per day, this might be seen as a waste. To cut down on

the noise in our summary index, we can use a CSV as cheap interim storage.

The steps are essentially to:

1. Periodically query recent data and update the CSV.

2. Capture top values in summary at the end of the day.

3. Empty the CSV file.

Our periodic query looks like the following:

source="impl_splunk_gen"

| stats count by req_time

| append [inputcsv top_req_time.csv]

| stats sum(count) as count by req_time

| sort 10000 -count

| outputcsv top_req_time.csv

Let's break the query down line by line:

• source="impl_splunk_gen" : This is the query to find the events for this

slice of time.

• | stats count by req_time : This helps calculate the count by req_time .

• | append [inputcsv top_req_time.csv] : This loads the results generated

so far from the CSV file, and adds the events to the end of our current results.

• | stats sum(count) as count by req_time : This uses stats to combine

the results from our current time slice and the previous results.

• | sort 10000 -count : This sorts the results descending by count . The

second word, 10000 , specifies that we want to keep the first 10,000 results.

• | outputcsv top_req_time.csv : This overwrites the CSV file.

Schedule the query to run periodically, perhaps every 15 minutes. Follow the same

rules about latency as discussed in the How latency affects summary queries section.

Search WWH ::

Custom Search

Home