Database Reference
In-Depth Information
Other big data descriptors
There are other terms that are necessary to understand when talking about big data. These
are:
Streaming data : Much of the data that is large and comes quickly does not need to
be kept. For instance, consider a mechanical plant. There can sometimes be many
sensors that collect data on all parts of the assembly line. The significance of this
data is primarily to be able to alert someone to a possible upcoming problem
(through noticing a bad trend) or to a current problem (by drawing attention to a
metric that has exceeded some designated level); much of it does not need to be
kept for a long period of time. This type of data is called streaming data, and
Splunk, with its abilities to create alerts, allows organizations to use this data to
make sure they prevent or act quickly on problems that can occur.
Tip
Later, in Chapter 6 , Using the Twitter App , we'll use streaming Twitter data for
analysis.
Latency of data: The term latency in regards to data refers to delay in how
speedily it is entered into the system for analysis. Splunk is able to analyze data in
real-time with no latency issues when deployed on hardware that is sufficient to
handle the indexing and searching workload. For example, if an alert goes off, a
system can be immediately shut down if there is no latency in the data. If a denial
of service attack is taking place, the system can be quickly used to figure out what
is happening right at that very time.
Sparseness of data: Splunk is also excellent for dealing with sparse data. Much
data in retailing environments is considered sparse. Consider a store that has many
products but where most people just buy a few of them on any given shopping trip.
If the store's database has fields specifying how many items of a particular type
have been purchased by each customer, most of the fields would be empty if the
time interval under consideration was short. We would say then that the data is
sparse. In Splunk, the sparseness of data in a search ranges from dense (meaning
that a result is obtained 10 percent of the time or more) to sparse (from 0.01 to 1
percent of the time). This can also extend to super sparse, or, for a better definition,
trying to find a needle in a haystack (which is less than 0.01 percent), and even to
rare, which is just a handful of cases.
Search WWH ::




Custom Search