Database Reference
In-Depth Information
mutating a local copy into a format more suited to querying (such as
B-Tree).
Space Management
Although disk space is inexpensive, it is neither free nor infinite. This means
that logs must eventually be removed from the system. Kafka, unlike many
systems, uses a time-based retention mechanism for its logs. This means
that logs, usually grouped into 1GB files unless otherwise specified, are
removed after a certain number of hours of retention.
This means that Kafka happily uses its entire disk and does have a failure
mode in the 0.7.x series; it continues to accept messages despite being
unable to successfully persist them to non-volatile storage. At first glance,
this seems like a recipe for disaster. However, it is generally much easier
to manage available storage than it is to manage potential slow clients
or requests for older data. Space management is generally a built-in
component of core system monitoring and is easily remedied via the
introduction of more disk per broker or adding brokers and partitions.
Not a Queuing System
Kafka is very explicitly not a queuing system like ActiveMQ or RabbitMQ,
which go through a lot of trouble to ensure that messages are processed
in the order that they were received. Kafka's partitioning system does not
maintain this structure. There is no defined order of writes and reads to
partitions of a particular topic, so there is no guarantee that a client won't
read from partitions in a different order than they were written.
Additionally, it is not uncommon to implement producers asynchronously,
so a message sent to one partition may be written after a message sent to
another partition despite happening first due to differences in latency or
other nondeterministic events.
This is generally fine because, in many Internet applications, maintaining
explicit ordering is overkill. The order of events is essentially arbitrary to
begin with, so there is no real benefit to preserving one particular arbitrary
order over another arbitrary order. Furthermore, the events where ordering
is important usually occur far enough apart that they still appear to be
ordered by any processing system.
Kafka also differs from many queuing systems in how it deals with message
consumers. In many queuing systems, messages are removed from the
Search WWH ::




Custom Search