Database Reference
In-Depth Information
As it happens, this sort of environment is not unique to LinkedIn. Many
companies that deal primarily with “Internet data” find themselves in the
same situation. Additionally, many of them are engineering focused,
meaning that most of their software is developed in-house rather than
licensed from a third party. This allows the companies to use the Kafka
model, and it is useful enough that a similar system, called Kinesis, was
recently announced by Amazon.com . This product aims to make up a core
part of the integration between various Amazon.com services, such as its
key-value store Dynamo, its block storage engine S3, its Hadoop
infrastructure Elastic MapReduce, and its high-performance data
warehouse Redshift.
This section covers the design of Kafka's internals and how they integrate to
solve the problems mentioned here.
Topics, Partitions, and Brokers
The organizing element of Kafka is the “topic.” In the Kafka system, this is a
physical partitioning ofthedatasuchthatalldatacontained within thetopic
should be somehow related. Most commonly, the messages in this topic are
related in that they can be parsed by the same common mechanism and not
much else.
A topic is further subdivided into a number of partitions. These partitions
are, effectively, the limit on the rate that an I/O-bound consumer can
retrieve data from Kafka. This is because clients often use a single consumer
thread (or process) per partition. For example, with Camus, a tool for
moving data from Kafka into the Hadoop Distributed File System (HDFS)
using Hadoop, a Mapper can pull from multiple partitions, but multiple
Mappers will not pull from the same partition.
Partitions are also used to logically organize a topic. Producer
implementations usuallyprovideamechanism tochoosetheKafkapartition
for a given message based on the key of that message.
Partitions themselves are distributed among brokers, which are the physical
processes that make up a Kafka cluster. Typically, each broker in the cluster
corresponds to a separate physical server and manages all of the writes to
that server's disk. The partitions are then uniformly distributed across the
different brokers and, in Kafka 0.8 and later, replicas are distributed across
other brokers in the cluster.
Search WWH ::




Custom Search