Database Reference
In-Depth Information
system when they have been consumed. Kafka has no such mechanism for
removing messages, instead relying on consumers to keep track of the offset
of the last message consumed. Although the high-level consumer shipped
with Kafka simplifies this somewhat by using ZooKeeper to manage the
offset, the brokers themselves have no notion of consumers or their state.
Replication
With version 0.8, Kafka introduced true replication for a broker cluster.
Previously, any notion of replication had to be handled via the
cluster-to-cluster mirroring features discussed in thenext section. However,
this is merely a backup strategy rather than a replication strategy.
Kafka's replication is handled at the topic level. Each partition of a topic has
a leader partition that is replicated by zero or more follower partitions (in
Kafka 0.8, unreplicated topics are simply leaders without followers). These
followerpartitions aredistributedamongthedifferentphysicalbrokerswith
the intent to place each follower on a different broker. Implementations
should ensure that they have sufficient broker capacity to support the
number of partitions and replicas that will be used for a topic.
When a partition is created, all of these followers are considered to be
“in-sync” and form the in-sync replica set (ISR). When a message arrives at
the leader partition to be written, it is first appended to the leader's log. The
message is then forwarded to each of the follower partitions currently in the
ISR. After each of the partitions in the ISR acknowledges the message, the
message is considered to be committed and can now be read by consumers.
The leader also occasionally records a high watermark containing the offset
of the most recently committed message and propagates it to the follower
partitions in the ISR.
If the broker containing a particular replica fails, it is removed from the
ISR and the leader does not wait for its response. This is handled through
Kafka's use of ZooKeepers to maintain a set of “live” brokers. The leader
then continues to process messages using the smaller pool of replicas in the
ISR. When the replica recovers, it examines its last-known high watermark
and truncates its log to that partition. The replica then copies data from
this position to the current committed offset of the leader. After the replica
has caught up, it may be added back to the ISR and everything proceeds as
before.
Search WWH ::




Custom Search