Database Reference
In-Depth Information
frameworks are designed for building websites, not high-performance data
collection systems. Instead, you should use a lightweight framework rather
than a heavyweight solution designed from the implementation of relatively
low volume interactive websites.
As mentioned earlier in the section on programming languages, the Go
language often performs well at this task. Although it does not yet have the
outright performance of the well-tuned Java web servers, it still manages
to perform better than nearly everything else in practical benchmarks.
Additionally, it seems to generally have a lighter memory footprint, and
there are some indications that it should perform better than a Java
application when there are a truly enormous number of connections being
made to the server. Anecdotal evidence suggests that being able to handle in
excess of 50k concurrent connections is not outside the realm of “normal”
for Go. The downside is that almost nobody knows how to program in Go,
so it can be difficult to locate resources.
Data Flow
If there is a pre-existing system to be integrated, choose Flume. Its built-in
suite of interfaces to other environments makes it an ideal choice for adding
a real-time processing system to a legacy environment. It is fairly easy
to configure and maintain and will get the project up and running with
minimal friction.
If retrofitting a system or building something new from scratch, there is no
reason not to use Kafka. The only possible reason to not use it would be
because it is necessary to use a language that does not have an appropriate
Kafka client. In that case, the community would certainly welcome even a
partial client (probably the Producer portion) for that language. Even with
the new safety features, its performance is still very good (to the point
of being a counterexample to the argument against Scala in the previous
section), and it essentially does exactly what is necessary.
Processing
Although Samza shows great promise, it is unfortunately still too immature
for most first attempts at a real-time processing system. This is primarily
due to its reliance on the Apache YARN framework for its distributed
processing. The claim is that YARN can support many different types of
computation, but the reality of the situation is that nearly all the
Search WWH ::




Custom Search