Database Reference
In-Depth Information
as date-time information. Extreme volumes make the variety of this informa-
tion even more difficult to handle. It's not uncommon for organizations to
rack up many terabytes—trillions of rows—of this machine data. Today, many
organizations simply purge this data, almost as if it's a waste byproduct
(which is why we often refer to it as data exhaust). Clearly, being able to find
the hidden value in system logs is a Big Data challenge.
To meet this challenge, InfoSphere BigInsights (BigInsights) ships with the
IBM Accelerator for Machine Data Analytics (known informally as the Machine
Data Accelerator, or MDA for short)—a special module that's designed to han-
dle the full lifecycle of log data analysis (shown in Figure 9-1).
Ingesting Machine Data
The first stage in the machine data analysis lifecycle is to ingest logs from IT
systems into HDFS (for an explanation of HDFS, see Chapter 5). The MDA
includes an ingest application, which handles this data movement operation,
but also helps prepare the machine data for the subsequent data processing
that will happen in BigInsights.
The MDA's data ingest function accepts logs in the form of batches, where
each batch represents one type of log. In addition to the log data itself, each
batch includes a metadata file, which describes key characteristics and inher-
ent assumptions. This information is necessary for the MDA to properly parse
and normalize the data. A common trait in machine data is for key metadata
elements, such as the year or the server name, to be encoded in the name of the
file where logs are stored. Clearly, when pulling together logs from different
time periods and different systems, this information needs to be factored in.
Visualization
Ad-hoc
Exploration
Statistical
Modeling
Ingest Logs
Extract
Transform
Reporting
Dashboards
Indexing
Faceted
Searching
Figure 9-1 The lifecycle of machine data analysis
 
Search WWH ::




Custom Search