The IBM Big Data Analytic Accelerators - Harness the Power of Big Data

Database Reference

In-Depth Information

This is required because information is often omitted from each

record in the interest of reducing log sizes. For example, timestamps

in each record might be missing the year or time zone, which is

provided in the file name or known externally. The MDA uses

values that are provided in the batch's metadata to fill in the missing

fields. Without this standardization of time stamp data, apples-to-

apples comparisons of log files are impossible.

4. Event enrichment User-specified metadata—such as server name,

data center name, or application name—can be associated with

machine data batches during the ingestion process. For example, the

server name is normally not included in batches of log records relating

directly to the server itself, but it would be very useful when analyzing

this information alongside batches of logs from other servers.

5. Event generalization Machine data records usually contain

varying values, such as time stamps, IP addresses, measurements,

percentages, and messages. By replacing the varying values with

constant values ( masking ), the events can be generalized. Generalized

events are collected and given unique IDs, which are then used for

downstream analysis. These generalized events can be used in

frequent sequence analysis to identify which sequences of generalized

events occur most frequently. They can also be used in significance

testing to identify which generalized events are the most significant

with respect to a specific error. The fields to be masked can vary with

the log type. Event generalization is optional, so when users don't

provide any fields, generalization is not performed.

6. Extraction validation in BigSheets Before running the extraction

operation, you can preview the results in BigSheets to ensure that

the correct fields are being extracted, and that the standardization,

enrichment, and generalization operations were applied correctly.

7. Extracted log storage The resulting data is stored as compressed

binary files in a hierarchy of directories where each directory

contains the parsed log records from a batch of logs. The logs are

formatted as JSON records; each record contains the original log

record and the extracted fields for the log record.

Search WWH ::

Custom Search

Home