Database Reference
In-Depth Information
Splunk data sources
Splunk was invented as a way to keep track of and analyze machine data coming from a
variety of computerized systems. It is a powerful platform for doing just that. But since its
invention, it has been used for a myriad of different types of data, including machine data,
log data (which is a type of machine data), and social media data. The various types of data
that Splunk is often used for are explained as follows:
Machine data : As mentioned previously, much of Splunk's data is machine data.
Machine data is data that is created each time a machine does something, even if it
is as seemingly insignificant as a tick on a clock. Each tick has information about
its exact time (down to the second) and source, and each of these becomes a field
associated with the event (the tick). The term machine data can be used in referen-
ce to a wide variety of data coming from computerized machines - from servers to
operating systems to controllers for robotic assembly arms. Almost all machine
data includes the time it was created or when the actual event took place. If no
timestamp is included, then Splunk will to find a date in the source name or file-
name based on the file's last modification time. As a last resort, it will stamp the
event with the time it was indexed into Splunk.
Web logs : Web logs are invaluable sources of information for anyone interested in
learning about how their website is used. Deep analysis of web logs can answer
questions about which pages are visited most, which pages have problems (people
leaving quickly, discarded shopping carts, and other aborted actions), and many
others. Google, in early 2014, was registering as many as 20 billion websites each
day, about which you can find more information at http://www.roche.com/media/
roche_stories/roche-stories-2014-01-22.htm .
Data files : Splunk can read in data from basically all types of files containing clear
data, or as they put it, any data. Splunk can also decompress the following types of
files: tar , gz , bz2 , tar.gz , tgz , tbz , tbz2 , zip , and z along with many
other formats. Splunk can even process files when they are being added to!
Social media data : An enormous amount of data is produced by social media
every second. Consider the fact that 829 million people log in to Facebook each
day (more information can be found at http://newsroom.fb.com/company-info/ )
and they spend, on average, 20 minutes at a time interacting with the site. Any
Facebook (or any other social media) interaction creates a significant amount of
data, even those that don't include many data-intensive acts, such as posting a pic-
ture, audio file, or a video. Other social media sources of data include popular sites
such as Twitter, LinkedIn, Pinterest, and Google+ in the U.S., and QZone, WeChat,
Search WWH ::




Custom Search