Databases Reference
In-Depth Information
records every day. If you multiply that by the number of servers to monitor, well, you
get the picture: it's big data.
Few organizations store their raw event log data in RDBMS s, because they don't
need the update and transactional processing features. Because NoSQL systems scale
and integrate with tools like MapReduce, they're cost effective when you're looking to
analyze event log data.
Though we'll use the term event log data to describe this data, a more precise term
is timestamped immutable data streams. Timestamped immutable data is created once
but never updated, so you don't have to worry about update operations. You only
need to focus on the reliable storage of the records and the efficient analysis of the
data, which is the case with many big data problems.
Distributed log file analysis is critical to allow an organization to quickly find errors
in systems and take corrective action before services are disrupted. It's also a good
example of the need for both real-time analysis and batch analysis of large datasets.
6.9.1
Challenges of event log data analysis
If you've ever been responsible for monitoring web or database servers, you know that
you can see what's happening on a server by looking at its detailed log file. Log events
add a record to the log file when your system starts up, when a job runs, and when
warnings or errors occur.
Events are classified according to their severity level using a standardized set of
severity codes. An example of these codes (from lowest to highest severity level) might
be TRACE , DEBUG , INFO , WARNING , ERROR , or FATAL . These codes have been stan-
dardized in the Java Log4j system.
Most events found in log files are informational ( INFO level) events. They tell you
how fast a web page is served or how quickly a query is executed. Informational events
are generally used for looking at system averages and monitoring performance. Other
event types such as WARNING , ERROR , or FATAL events are critical and should notify
an operator to take action or intervene.
Filtering and reporting on log events on a single system is straightforward and can
be done by writing a script that searches for keywords in the log file. In contrast, big
data problems occur when you have hundreds or thousands of systems all generating
events on servers around the world. The challenge is to create a mechanism to get
immediate notification of critical events and allow the noncritical events to be ignored.
A common solution to this problem is to create two channels of communication
between a server and the operations center. Figure 6.14 shows how these channels
work. At the top of the diagram, you see where all events are pulled from the sever,
transformed, and then the aggregates updated in a reliable filesystem such as HDFS.
In the lower part of the diagram, you see the second channel, where critical events
are retrieved from the server and sent directly to the operations dashboard for imme-
diate action.
Search WWH ::




Custom Search