Database Reference
In-Depth Information
Storm
License
Apache License, Version 2.0
Activity
High
Purpose
Streaming Ingest
Official Page
http://storm.apache.org
Hadoop Integration API Compatible
Many of the technologies in the big data ecosystem, including Hadoop MapReduce, are built
with very large tasks in mind. These systems are designed to perform work in batches, bund-
ling groups of smaller tasks into larger tasks and distributing those large tasks.
While batch processing is an effective strategy for performing complex analysis of very large
amounts of data in a distributed and fault-tolerant fashion, it's ill-suited for processing data
in real time. This is where a system like Storm comes in. Storm follows a stream processing
model rather than a batch processing model. This means it's designed to quickly perform rel-
atively simple transformations of very large numbers of small records.
In Storm, a workflow is called a “topology,” with inputs called “spouts” and transformations
called “bolts.” It's important to note that Storm topologies are very different from MapRe-
duce jobs, because jobs have a beginning and an end while topologies do not. The intent is
that once you define a topology, data will continue to stream in from your spout and be pro-
cessed through a series of bolts.
Tutorial Links
In addition to the official Storm tutorial , there is an excellent set of starter resources in
GitHub in the Storm-Starter project.
 
Search WWH ::




Custom Search