Introducing Splunk - Splunk Essentials

Database Reference

In-Depth Information

Other big data descriptors

There are other terms that are necessary to understand when talking about big data. These

are:

• Streaming data : Much of the data that is large and comes quickly does not need to

be kept. For instance, consider a mechanical plant. There can sometimes be many

sensors that collect data on all parts of the assembly line. The significance of this

data is primarily to be able to alert someone to a possible upcoming problem

(through noticing a bad trend) or to a current problem (by drawing attention to a

metric that has exceeded some designated level); much of it does not need to be

kept for a long period of time. This type of data is called streaming data, and

Splunk, with its abilities to create alerts, allows organizations to use this data to

make sure they prevent or act quickly on problems that can occur.

Tip

Later, in Chapter 6 , Using the Twitter App , we'll use streaming Twitter data for

analysis.

• Latency of data: The term latency in regards to data refers to delay in how

speedily it is entered into the system for analysis. Splunk is able to analyze data in

real-time with no latency issues when deployed on hardware that is sufficient to

handle the indexing and searching workload. For example, if an alert goes off, a

system can be immediately shut down if there is no latency in the data. If a denial

of service attack is taking place, the system can be quickly used to figure out what

is happening right at that very time.

• Sparseness of data: Splunk is also excellent for dealing with sparse data. Much

data in retailing environments is considered sparse. Consider a store that has many

products but where most people just buy a few of them on any given shopping trip.

If the store's database has fields specifying how many items of a particular type

have been purchased by each customer, most of the fields would be empty if the

time interval under consideration was short. We would say then that the data is

sparse. In Splunk, the sparseness of data in a search ranges from dense (meaning

that a result is obtained 10 percent of the time or more) to sparse (from 0.01 to 1

percent of the time). This can also extend to super sparse, or, for a better definition,

trying to find a needle in a haystack (which is less than 0.01 percent), and even to

rare, which is just a handful of cases.

Search WWH ::

Custom Search

Home