Database Reference
In-Depth Information
columns. In that sense, there could be millions of columns. In contrast, SQL
Server is limited to 1,024 columns.
Architecturally, HBase belongs to the master/slave collection of distributed
Hadoop implementations. It is also heavily reliant on Zookeeper (an Apache
project we discuss shortly).
Flume
Flume is the StreamInsight of the Hadoop ecosystem. As you would expect,
it is a distributed system that collects, aggregates, and shifts large volumes
of event streaming data into HDFS. Flume is also fault tolerant and can be
tuned for failover and recovery. However, in general terms, faster recovery
tends to mean trading some performance; so, as with most things, a balance
needs to be found.
The Flume architecture consists of the following components:
• Client
• Source
• Channel
• Sink
• Destination
Events flow from the client to the source. The source is the first Flume
component. The source inspects the event and then farms it out to one
or more channels for processing. Each channel is consumed by a sink . In
Hadoop parlance, the event is “drained” by the sink. The channel provides
theseparationbetweensourceandsinkandisalsoresponsibleformanaging
recovery by persisting events to the file system if required.
Once an event is drained, it is the sink's responsibility to then deliver the
event to the destination. There are a number of different sinks available,
including an HDFS sink. For the Integration Services users out there
familiar with the term backpressure, you can think of the channel as the
component that handles backpressure. If the source is receiving events
faster than they can be drained, it is the channel's responsibility to grow and
manage that accumulation of events.
A single pass through a source, channel, and sink is known as a hop . The
componentsforahopexistinasingleJVMcalledan agent .However,Flume
Search WWH ::




Custom Search