Kafka Offset
In Apache Kafka, an offset is a unique number that identifies the position of a message within a partition. Think of it like a page number in a book. Each message in a Kafka partition gets a sequential number, starting from 0.
Offsets help Kafka consumers know:
- Which messages have already been read
- Which message should be read next
- How to resume reading after a restart or failure
In simple words, a Kafka offset tracks message consumption.
Key Features of Kafka Offset
- Unique per Partition: Offsets are unique only within a partition, not across the entire topic.
- Sequential Order: Kafka assigns offsets in increasing order (0, 1, 2, 3, and so on).
- Consumer Controlled: Consumers keep track of offsets, not producers.
- Supports Replay: Consumers can reset offsets to re-read old messages if needed.
- Fault Tolerant: Kafka stores committed offsets, allowing consumers to resume after crashes.
- Independent Consumption: Different consumer groups can read the same messages using their own offsets.
Simple Example of Kafka Offset
Let’s understand Kafka offset with a simple example.
Scenario
- Topic name: order-topic
- Partition: Partition 0
Messages in the Partition
| Offset | Message |
|---|---|
| 0 | Order Created |
| 1 | Payment Received |
| 2 | Order Packed |
| 3 | Order Shipped |
How Consumer Uses Offset
If a consumer has successfully processed messages up to offset 1, it means:
- Messages with offset 0 and 1 are already consumed
- The next message to read will be offset 2
If the consumer crashes and restarts, Kafka allows it to continue from offset 2 instead of starting from the beginning.
Kafka Offset Commit
When a consumer finishes processing messages, it commits the offset. This tells Kafka, “I have safely processed messages up to this point.”
Offsets can be committed in two ways:
- Automatic Commit: Kafka commits offsets automatically at a fixed interval
- Manual Commit: The application explicitly commits offsets after processing
Manual commit gives better control and is commonly used in production systems.
Why Kafka Offset is Important
- Prevents message loss
- Avoids duplicate message processing
- Enables reliable and scalable data processing
- Supports consumer recovery and reprocessing
Summary
A Kafka offset is a number that represents the position of a message in a partition. It helps consumers track what they have already read and what comes next. Offsets make Kafka reliable, fault-tolerant, and flexible by allowing message replay and safe recovery after failures.
In short, Kafka offsets are the backbone of message tracking in Kafka.