Information Technology Reference
In-Depth Information
Here we create a miniature Flume distributed feed (log events) collection system using
agents as nodes, which get data (RSS feeds in this case) from an RSS feed reader. These agents
will pass on these feeds to a collector node that will be responsible for storing the feeds in an
HDFS cluster. In this example, we will use two Flume agent nodes, one Flume collector node,
and a three-node HDFS cluster. Table 9.1 describes sources and sinks for agent and collector
nodes.
TABLE 9.1 Sources and sinks for agent and collector nodes
Nodes
Source
Sink
Agent node
RSS feed
Collector
Collector node
Agents
HDFS
Figure 10.13 shows the architectural overview of our multihop system with two agent
nodes, one collector node, and an HDFS cluster. The RSS web feed is an Avro source for
both the agents that stores feeds in a memory channel. As the feeds pile up in the memory
channel of the two agents, the Avro sinks start sending these events to the collector node's
Avro source. The collector also uses a memory channel and an HDFS sink to dump feeds
into the HDFS cluster. The code and configurations are provided in Listing 10.1.
The Java code in the listing describes an RSS reader that reads RSS web sources from the
BBC news website. RSS is a family of web feed formats used to frequently publish updates
(such as blog entries, news headlines, audio, and video) in a standardized format. RSS uses
a publish-subscribe model to check the subscribed feeds regularly for updates.
The Java code uses Java's Net and Javax XML APIs to read the contents of a URL
source in a W3C document and processes that information, before writing the information
to the Flume channel.
Listing 10.1: Java code RSSReader.java
import java.net.URL;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.CharacterData;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class RSSReader {
private static RSSReader instance = null;
private RSSReader() {
}
Search WWH ::




Custom Search