What The Kafka!

In this article, we'll explore some of the basic concepts of Apache Kafka with hand drawn illustrations and cover all the commonly used terminologies in relation to Kafka.

A typical messaging system sends a message point-to-point from the sender to the receiver.

If the sender wants to send data to multiple receivers, it has to duplicate information and send it separately to each receiver.

Clearly this system doesn’t scale well and is grossly inefficient. This is where a publish-subscribe messaging system comes into the picture. In this system, the publisher sends message to a node to which one or more subscribers are listening to.

In the Kafka world, a publisher is called a Producer, and a subscriber is called a Consumer. In the real world, there can be one or more Producers (called a Producer group) sending messages to a node, and one or more Consumers (called a Consumer group) subscribing for messages from a node. A message in Kafka is called a Record. Each record is a byte-array that can store objects of any format.

A node in the case of Kafka is called a Broker, which has one or more Topics. Topics are categories to which Producer sends a message and consumer subscribes to. Each topic is then divided into Partitions so that multiple consumers can read from the same topic in parallel.

Each topic in the diagram has 2 partitions. A producer will send a record to a given Topic-Partition and a consumer will read a record from a Topic-Partition. Partitions are log of records and each new record is added at the end of the log. The consumer can decide from which offset in the logs, it wants to read the record.

Now having a single broker is not good for fault tolerance, if that Broker goes down. Hence, Kafka allows setting up multiple brokers. A group of brokers is called a Cluster. Management of brokers within a cluster, is performed by Zookeeper. There can be one or more clusters in a single Zookeeper instance.

In case of a multi broker setup, each broker will have a Topic → Partitions. These brokers are replicas of each other, with one being the leader, who has the responsibility of replicating records between partitions of other Brokers. If the Broker goes down, the replica will start operating as the leader. This allows us to build a scalable, distributed, and a fault-tolerant architecture.

For more information, read the official quick start guide of Apache Kafka.